Skip to contents

Returns the average word-vector representation of a text column in a data frame, using a fastText model.

Usage

get_corpus_representation(df, model, text_field = "text", normalize = T)

Arguments

df

A data.frame with a column containing text identified by text_field.

model

A fastText model, loaded by fastrtext::load_model().

text_field

A character string indicating the name of the text column in df.

normalize

Logical. Default TRUE. Normalize the vectors to their Euclidean norm?

Value

A single-row sparse matrix of class dgCMatrix

as returned by Matrix.

Examples

model <- fastrtext::load_model(system.file("extdata",
                                           "tw_demo_model_sml.bin",
                                            package = "dictvectoR"))
tw_annot <- tw_annot %>% head(15) %>% clean_text(text_field = "full_text")
corpus_rep <- get_corpus_representation(tw_annot, model)