Adds the number of occurrences of a word or multi-word expression in a quanteda tokens object to a data.frame. By default, it checks if 'hits' have been counted before and only fills in missing values. Works with GLOB-style wildcards.
Arguments
- word_df
A data.frame containing words.
- tokens
A word-tokens object, returned by quanteda::tokens.
- word_field
Character. Default is "words". Name of the column in
word_df
that contains the words.- replace
Logical. Default is
FALSE
. IfFALSE
checks fills in 'hits' only for missing observations. IfTRUE
counts 'hits' for all words in word_df.
Value
A data.frame with a column 'hits'
indicating the frequency of this term
in the tokens. Arranged by descending number of hits.
Examples
tw_data %<>% head(100) %>% clean_text(text_field = 'full_text')
toks <- quanteda::tokens(tw_data$text)
word_df <- data.frame(words = c("der deutschen", "steuer*", "xyz")) %>%
get_hits(tokens = toks)
#> [1] "Counting word occurrences..."