Returns the F1 score for a DDR measurement in predicting a binary reference (i.e. a manually annotated variable).
Usage
get_F1(
df,
dictionary,
model,
reference,
text_field = "text",
replace_na = c("mean-sd", "min", 0, F)
)
Arguments
- df
A data.frame containing one annotated document or sentence per row.
- dictionary
A character vector containing the keywords of a dictionary, passed to
cossim2dict
.- model
A fasttext model as loaded by
load_model
.- reference
Name of the binary reference column in df.
- text_field
Name of column in
df
that contains the text of the documents. Default is "text".- replace_na
Specifies the value used to replace NAs in the DDR measurement. Default is 'mean-sd'. Can take values:
'mean-sd'
(charcter): replace NAs by mean - 1sd. Default.'min'
(charcter): replace NAs by minimum.0
(numerical): replace NAs by 0.FALSE
(logical): do not replace NAs.
Details
The gradual DDR measurement is passed to get_prediction()
to obtain a binary
prediction through logistic regression.
The F1 scores indicate the performance of these words/dictionaries in predicting a binary
coding, when used in the DDR method.
The F1 score is the harmonic mean between Recall and Precision (1).
References
(1) Chinchor, N. (1992). MUC-4 evaluation metrics. Proceedings of the 4th Conference on Message Understanding, 22–29. https://doi.org/10.3115/1072064.1072067
Examples
model <- fastrtext::load_model(system.file("extdata",
"tw_demo_model_sml.bin",
package = "dictvectoR"))
tw_annot %<>% clean_text(text_field = "full_text")
dict <- c("skandal", "deutschland", "steuerzahler")
get_F1(tw_annot, dict, model, 'pop')
#> [1] 0.2897527