'how to set threshold to scikit learn random forest model
After seeing the precision_recall_curve, if I want to set threshold = 0.4, how to implement 0.4 into my random forest model (binary classification), for any probability <0.4, label it as 0, for any >=0.4, label it as 1.
from sklearn.ensemble import RandomForestClassifier
random_forest = RandomForestClassifier(n_estimators=100, oob_score=True, random_state=12)
random_forest.fit(X_train, y_train)
from sklearn.metrics import accuracy_score
predicted = random_forest.predict(X_test)
accuracy = accuracy_score(y_test, predicted)
Documentation Precision recall
Solution 1:[1]
random_forest = RandomForestClassifier(n_estimators=100)
random_forest.fit(X_train, y_train)
threshold = 0.4
predicted = random_forest.predict_proba(X_test)
predicted[:,0] = (predicted[:,0] < threshold).astype('int')
predicted[:,1] = (predicted[:,1] >= threshold).astype('int')
accuracy = accuracy_score(y_test, predicted)
print(round(accuracy,4,)*100, "%")
this comes with an error refers to the last accuracy part" ValueError: Can't handle mix of binary and multilabel-indicator"
Solution 2:[2]
sklearn.metrics.accuracy_score takes 1 d array but your predicted array is 2-d. This comes with an error.
https://scikit-learn.org/stable/modules/generated/sklearn.metrics.accuracy_score.html
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | edesz |
| Solution 2 |
