Skip to contents

Efficiently computes F1 scores for a character vector of keywords stored in a data.frame, or a list of dictionaries stored in a data.frame - for a reference data.frame grouped by group_field. ADD DETAILS....

Usage

get_many_F1s_by_group(
  keyword_df,
  keyword_field = "words",
  id = "id",
  model,
  text_df,
  group_field,
  reference,
  text_field = "text",
  replace_na = c("mean-sd", "min", 0, F)
)

Arguments

keyword_df

A data.frame, containing a column with a character vector of words, or with a list of dictionaries.

keyword_field

character. The name of the column in keyword_df that is either a character vector of single keywords, or a list of dictionaries, stored as separate character vectors with one element per word.

id

A unique identifier for each keyword.

model

A fastText model, loaded by load_model.

text_df

A data.frame containing one annotated document per row.

group_field

character. Name of a categorical grouping variable in df.

reference

character. Name of the binary reference column in df.

text_field

Name of column in df that contains the text of the documents. Default is "text".

replace_na

Specifies the value used to replace NAs in the DDR measurement. Default is 'mean-sd'. Can take values:

  • 'mean-sd' (charcter): replace NAs by mean - 1sd. Default.

  • 'min' (charcter): replace NAs by minimum.

  • 0 (numerical): replace NAs by 0.

  • FALSE (logical): do not replace NAs.

Examples

model <- fastrtext::load_model(system.file("extdata",
                               "tw_demo_model_sml.bin",
                                package = "dictvectoR"))
tw_annot %<>% clean_text(text_field = "full_text")
dict_df <- data.frame(id = 1:3)
dict_df$combis <- list(c("mehrheit deutschen", "merkel", "skandal"),
                      c("steuerzahler", "bundesregierung",
                      "komplett gescheitert"),
                      c( "arbeitnehmer", "groko", "wahnsinn"))
get_many_F1s_by_group(keyword_df = dict_df,
                     keyword_field = "combis",
                     id = "id",
                     model = model,
                     text_df = tw_annot,
                     group_field = "party",
                     reference = 'pop')
#> Joining, by = "id"
#>   id                                              combis    F1_AfD F1_B90Grune
#> 1  1                 mehrheit deutschen, merkel, skandal 0.6447368           0
#> 2  2 steuerzahler, bundesregierung, komplett gescheitert 0.6666667           0
#> 3  3                       arbeitnehmer, groko, wahnsinn 0.6953642           0
#>   F1_CDU F1_CSU F1_FDP  F1_Linke F1_SPD
#> 1      0      0      0 0.0000000      0
#> 2      0      0      0 0.4210526      0
#> 3      0      0      0 0.3548387      0