Efficiently computes Recall, Precision, and F1 scores for a character vector of keywords stored in a data.frame, or a list of dictionaries stored in a data.frame. Adds Recall, Precision, and F1 to the data.frame.
Usage
get_many_RPFs(
keyword_df,
keyword_field = "words",
model,
text_df,
reference,
text_field = "text",
replace_na = c("mean-sd", "min", 0, F)
)
Arguments
- keyword_df
A data.frame, containing a column with a character vector of words, or with a list of dictionaries.
- keyword_field
character. The name of the column in
keyword_df
that is either a character vector of single keywords, or a list of dictionaries, stored as separate character vectors with one element per word.- model
A fastText model, loaded by
load_model
.- text_df
A data.frame containing one annotated document per row.
- reference
Name of the binary reference column in
df
(character).- text_field
Name of column in
df
that contains the text of the documents. Default is "text".- replace_na
Specifies the value used to replace NAs in the DDR measurement. Default is 'mean-sd'. Can take values:
'mean-sd'
(charcter): replace NAs by mean - 1sd. Default.'min'
(charcter): replace NAs by minimum.0
(numerical): replace NAs by 0.FALSE
(logical): do not replace NAs.
Details
For each element (i.e. word or dictionary) of the character vector or list in the data.frame . The F1 scores indicate the performance of these words/dictionaries in predicting a binary coding, when used in the DDR method. The resulting gradual measure from the DDR measure is passed to a logistic regression, with the binary coding as dependent variable. Binary predictions are calculated from this logistic model and compared with the binary coding. The F1 score is the harmonic mean between Recall and Precision (Chinchor, 1992).
References
Chinchor, N. (1992). MUC-4 evaluation metrics. Proceedings of the 4th Conference on Message Understanding, 22–29. https://doi.org/10.3115/1072064.1072067
See also
cossim2dict
, get_prediction
, get_F1
, get_many_RPFs
, confusionMatrix
Examples
model <- fastrtext::load_model(system.file("extdata",
"tw_demo_model_sml.bin",
package = "dictvectoR"))
tw_annot %<>% clean_text(text_field = "full_text")
dict_df <- data.frame(id = 1:3)
dict_df$combis <- list(c("mehrheit deutschen", "merkel", "skandal"),
c("steuerzahler", "bundesregierung",
"komplett gescheitert"),
c( "arbeitnehmer", "groko", "wahnsinn"))
get_many_RPFs(keyword_df = dict_df,
keyword_field = "combis",
model = model,
text_df = tw_annot, reference = "pop")
#> Joining, by = "rowid"
#> id combis recall precision
#> 1 1 mehrheit deutschen, merkel, skandal 0.08796296 0.5428571
#> 2 2 steuerzahler, bundesregierung, komplett gescheitert 0.16666667 0.6101695
#> 3 3 arbeitnehmer, groko, wahnsinn 0.05092593 0.6111111
#> F1
#> 1 0.15139442
#> 2 0.26181818
#> 3 0.09401709