'RandomForestClassifier instance not fitted yet. Call 'fit' with appropriate arguments before using this method

I am trying to train a decision tree model, save it, and then reload it when I need it later. However, I keep getting the following error:

This DecisionTreeClassifier instance is not fitted yet. Call 'fit' with appropriate arguments before using this method.

Here is my code:

X_train, X_test, y_train, y_test = train_test_split(data, label, test_size=0.20, random_state=4)

names = ["Decision Tree", "Random Forest", "Neural Net"]

classifiers = [
    DecisionTreeClassifier(),
    RandomForestClassifier(),
    MLPClassifier()
    ]

score = 0
for name, clf in zip(names, classifiers):
    if name == "Decision Tree":
        clf = DecisionTreeClassifier(random_state=0)
        grid_search = GridSearchCV(clf, param_grid=param_grid_DT)
        grid_search.fit(X_train, y_train_TF)
        if grid_search.best_score_ > score:
            score = grid_search.best_score_
            best_clf = clf
    elif name == "Random Forest":
        clf = RandomForestClassifier(random_state=0)
        grid_search = GridSearchCV(clf, param_grid_RF)
        grid_search.fit(X_train, y_train_TF)
        if grid_search.best_score_ > score:
            score = grid_search.best_score_
            best_clf = clf

    elif name == "Neural Net":
        clf = MLPClassifier()
        clf.fit(X_train, y_train_TF)
        y_pred = clf.predict(X_test)
        current_score = accuracy_score(y_test_TF, y_pred)
        if current_score > score:
            score = current_score
            best_clf = clf


pkl_filename = "pickle_model.pkl"  
with open(pkl_filename, 'wb') as file:  
    pickle.dump(best_clf, file)

from sklearn.externals import joblib
# Save to file in the current working directory
joblib_file = "joblib_model.pkl"  
joblib.dump(best_clf, joblib_file)

print("best classifier: ", best_clf, " Accuracy= ", score)

Here is how I load the model and test it:

#First method
with open(pkl_filename, 'rb') as h:
    loaded_model = pickle.load(h) 
#Second method 
joblib_model = joblib.load(joblib_file)

As you can see, I have tried two ways of saving it but none has worked.

Here is how I tested:

print(loaded_model.predict(test)) 
print(joblib_model.predict(test)) 

You can clearly see that the models are actually fitted and if I try with any other models such as SVM, or Logistic regression the method works just fine.



Solution 1:[1]

The problem is in this line:

best_clf = clf

You have passed clf to grid_search, which clones the estimator and fits the data on those cloned models. So your actual clf remains untouched and unfitted.

What you need is

best_clf = grid_search

to save the fitted grid_search model.

If you dont want to save the entire contents of grid_search, you can use the best_estimator_ attribute of grid_search to get the actual cloned fitted model.

best_clf = grid_search.best_estimator_

Solution 2:[2]

Just wanted to add a little bit to above answer. Even if you copy paste the pickle file manually to different directory where you want to load the model, we end up with that error. If you want to move that file use cut paste.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Vivek Kumar
Solution 2 Rupesh Suryawanshi