Efficiently computes F1 scores for a character vector of keywords stored in a
data.frame,
or a list of dictionaries stored in a data.frame - for a reference data.frame
grouped by group_field
.
ADD DETAILS....
Usage
get_many_F1s_by_group(
keyword_df,
keyword_field = "words",
id = "id",
model,
text_df,
group_field,
reference,
text_field = "text",
replace_na = c("mean-sd", "min", 0, F)
)
Arguments
- keyword_df
A data.frame, containing a column with a character vector of words, or with a list of dictionaries.
- keyword_field
character. The name of the column in
keyword_df
that is either a character vector of single keywords, or a list of dictionaries, stored as separate character vectors with one element per word.- id
A unique identifier for each keyword.
- model
A fastText model, loaded by
load_model
.- text_df
A data.frame containing one annotated document per row.
- group_field
character. Name of a categorical grouping variable in
df
.- reference
character. Name of the binary reference column in
df
.- text_field
Name of column in
df
that contains the text of the documents. Default is "text".- replace_na
Specifies the value used to replace NAs in the DDR measurement. Default is 'mean-sd'. Can take values:
'mean-sd'
(charcter): replace NAs by mean - 1sd. Default.'min'
(charcter): replace NAs by minimum.0
(numerical): replace NAs by 0.FALSE
(logical): do not replace NAs.
Details
#'@seealso cossim2dict
, get_prediction
, get_F1
, get_many_RPFs
, confusionMatrix
Examples
model <- fastrtext::load_model(system.file("extdata",
"tw_demo_model_sml.bin",
package = "dictvectoR"))
tw_annot %<>% clean_text(text_field = "full_text")
dict_df <- data.frame(id = 1:3)
dict_df$combis <- list(c("mehrheit deutschen", "merkel", "skandal"),
c("steuerzahler", "bundesregierung",
"komplett gescheitert"),
c( "arbeitnehmer", "groko", "wahnsinn"))
get_many_F1s_by_group(keyword_df = dict_df,
keyword_field = "combis",
id = "id",
model = model,
text_df = tw_annot,
group_field = "party",
reference = 'pop')
#> Joining, by = "id"
#> id combis F1_AfD F1_B90Grune
#> 1 1 mehrheit deutschen, merkel, skandal 0.6447368 0
#> 2 2 steuerzahler, bundesregierung, komplett gescheitert 0.6666667 0
#> 3 3 arbeitnehmer, groko, wahnsinn 0.6953642 0
#> F1_CDU F1_CSU F1_FDP F1_Linke F1_SPD
#> 1 0 0 0 0.0000000 0
#> 2 0 0 0 0.4210526 0
#> 3 0 0 0 0.3548387 0