'How to use average F1 score in GridSearchCV with OneVsRestClassifier?

I have an imbalanced multi-label classification problem and use the (simplified) code below to determine hyperparameters.

# Pipeline
pipeline = Pipeline([
    ('clf', OneVsRestClassifier(LogisticRegression()))
])
    
# Parameters to test in Grid Search
parameters = {
    'clf__estimator__C': [1] 
}

# Use stratified sampling in each iteration
stratified_k_fold_cv = IterativeStratification(n_splits=2, order=1)
# Optimize for weighted F1-score
scorer = make_scorer(f1_score, average="weighted")

# Grid Search
grid_lr = GridSearchCV(pipeline, parameters, cv=stratified_k_fold_cv, scoring=scorer)
grid_lr.fit(X_train_tfidf, Y_train)

# Print results
print("Best Parameters: {}".format(grid_lr.best_params_))
print("Mean cross-validated F1-score of the best estimator: {}".format(grid_lr.best_score_))

For each of the 15 labels I have binary classes which are highly imbalanced, sometimes towards the 0 and sometimes towards the 1 class. Therefore I would like to, for each label, look at the average F1 score of the 0 and 1 class by using f1_score(average="weighted"). However, when it comes to aggregating the F1 scores of the labels, I would like to build the average (of the 15 averaged F1 scores) again. How can this be implemented? In the make_scorer function I can only specify to use the weighted average once (and think this accounts for averaging the label F1 scores, not for averaging each individual label score).



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source