Skip to contents

Returns the F1 score for a DDR measurement in predicting a binary reference (i.e. a manually annotated variable).

Usage

get_F1(
  df,
  dictionary,
  model,
  reference,
  text_field = "text",
  replace_na = c("mean-sd", "min", 0, F)
)

Arguments

df

A data.frame containing one annotated document or sentence per row.

dictionary

A character vector containing the keywords of a dictionary, passed to cossim2dict.

model

A fasttext model as loaded by load_model.

reference

Name of the binary reference column in df.

text_field

Name of column in df that contains the text of the documents. Default is "text".

replace_na

Specifies the value used to replace NAs in the DDR measurement. Default is 'mean-sd'. Can take values:

  • 'mean-sd' (charcter): replace NAs by mean - 1sd. Default.

  • 'min' (charcter): replace NAs by minimum.

  • 0 (numerical): replace NAs by 0.

  • FALSE (logical): do not replace NAs.

Details

The gradual DDR measurement is passed to get_prediction() to obtain a binary prediction through logistic regression. The F1 scores indicate the performance of these words/dictionaries in predicting a binary coding, when used in the DDR method. The F1 score is the harmonic mean between Recall and Precision (1).

References

(1) Chinchor, N. (1992). MUC-4 evaluation metrics. Proceedings of the 4th Conference on Message Understanding, 22–29. https://doi.org/10.3115/1072064.1072067

Examples

model <- fastrtext::load_model(system.file("extdata",
                               "tw_demo_model_sml.bin",
                                package = "dictvectoR"))
tw_annot %<>% clean_text(text_field = "full_text")
dict <- c("skandal", "deutschland", "steuerzahler")
get_F1(tw_annot, dict, model, 'pop')
#> [1] 0.2897527