'Pycaret predict error in multiclassification using Colab
I'm using the Pycaret library in Colab to make a simple prediction on this dataset:
https://www.kaggle.com/andrewmvd/fetal-health-classification
When i run my code:
from pycaret.utils import enable_colab
enable_colab()
from google.colab import drive
drive.mount('/content/drive')
import pandas as pd
from pycaret.classification import *
from pandas_profiling import ProfileReport
df= pd.read_csv("/content/drive/MyDrive/Pycaret/fetal_health.csv")
df2 = df.iloc[:,:11]
df2['fetal_health'] = df['fetal_health']
test = df2.sample(frac=0.10, random_state=42, weights='fetal_health')
train = df2.drop(test.index)
test.reset_index(inplace=True, drop=True)
train.reset_index(inplace=True, drop=True)
clf = setup(data =train, target = 'fetal_health', session_id=42,
log_experiment=True, experiment_name='fetal', normalize=True)
best = compare_models(sort="Accuracy")
rf = create_model('rf', fold=30)
tuned_rf = tune_model(rf, optimize='Accuracy')
predict_model(tuned_rf)
I get this error:
I think this is because my target variable is imbalanced (see img) and is causing the predictions to be incorrect.
Can someone pls help me understand ? Tks in advance
Solution 1:[1]
Have you run each step in a separate cell to check the outputs?
Run
clf = setup(data =train, target = 'fetal_health', session_id=42,
log_experiment=True, experiment_name='fetal', normalize=True)
and check:
Are all variable types correctly inferred? (E.g., using your code with the Kaggle dataset of the same name, all variable shows as numeric except for severe_decelerations that shows as "Categorical" -- is it correct?
Is there any preprocessing configuration that needs to change? I'm sure your issue has nothing to do with an imbalanced target variable, but you can test yourself by changing your setup (adding fix_imbalance = True to change the default -- it shows as False when you check the setup output).
You can learn more about the available preprocessing configurations here:
https://pycaret.gitbook.io/docs/get-started/preprocessing
Also, while troubleshooting, you can save yourself some work by using
best_model = create_model(best, fold=30)
predict_model(best_model)
(No need to look up the best model to add manually to create_model(),
or to use tune_model() until you got the model working.)
Solution 2:[2]
I found what the problem was: My target variables begin with value 1 and has 3 different values. This makes a error when the Pycaret tries to make a list comprehension (because it starts with the zero index). To solve that i just transformed my variable to begin with zero and worked fine
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | A. Beal |
| Solution 2 | leandro minervino |


