Get occurrence frequency of words. — get

Adds the number of occurrences of a word or multi-word expression in a quanteda tokens object to a data.frame. By default, it checks if 'hits' have been counted before and only fills in missing values. Works with GLOB-style wildcards.

Usage

get_hits(word_df, tokens, word_field = "words", replace = F)

Arguments

word_df: A data.frame containing words.
tokens: A word-tokens object, returned by quanteda::tokens.
word_field: Character. Default is "words". Name of the column in word_df that contains the words.
replace: Logical. Default is FALSE. If FALSE checks fills in 'hits' only for missing observations. If TRUE counts 'hits' for all words in word_df.

Value

A data.frame with a column 'hits' indicating the frequency of this term in the tokens. Arranged by descending number of hits.

Examples

tw_data %<>% head(100) %>% clean_text(text_field = 'full_text')
toks <- quanteda::tokens(tw_data$text)
word_df <- data.frame(words = c("der deutschen", "steuer*", "xyz")) %>%
get_hits(tokens = toks)
#> [1] "Counting word occurrences..."