'Pycaret predict error in multiclassification using Colab

I'm using the Pycaret library in Colab to make a simple prediction on this dataset:

https://www.kaggle.com/andrewmvd/fetal-health-classification

When i run my code:

from pycaret.utils import enable_colab 
enable_colab()


from google.colab import drive
drive.mount('/content/drive')


import pandas as pd
from pycaret.classification import *
from pandas_profiling import ProfileReport


df= pd.read_csv("/content/drive/MyDrive/Pycaret/fetal_health.csv")


df2 = df.iloc[:,:11]
df2['fetal_health'] = df['fetal_health']



test = df2.sample(frac=0.10, random_state=42, weights='fetal_health')
train = df2.drop(test.index)

test.reset_index(inplace=True, drop=True)
train.reset_index(inplace=True, drop=True)


clf = setup(data =train, target = 'fetal_health', session_id=42,
 log_experiment=True, experiment_name='fetal', normalize=True)

best = compare_models(sort="Accuracy")


rf = create_model('rf', fold=30)


tuned_rf = tune_model(rf, optimize='Accuracy')


predict_model(tuned_rf)

I get this error:

error

I think this is because my target variable is imbalanced (see img) and is causing the predictions to be incorrect.

enter image description here

Can someone pls help me understand ? Tks in advance



Solution 1:[1]

Have you run each step in a separate cell to check the outputs?

Run

clf = setup(data =train, target = 'fetal_health', session_id=42,
 log_experiment=True, experiment_name='fetal', normalize=True)

and check:

  1. Are all variable types correctly inferred? (E.g., using your code with the Kaggle dataset of the same name, all variable shows as numeric except for severe_decelerations that shows as "Categorical" -- is it correct?

  2. Is there any preprocessing configuration that needs to change? I'm sure your issue has nothing to do with an imbalanced target variable, but you can test yourself by changing your setup (adding fix_imbalance = True to change the default -- it shows as False when you check the setup output).

You can learn more about the available preprocessing configurations here:

https://pycaret.gitbook.io/docs/get-started/preprocessing

Also, while troubleshooting, you can save yourself some work by using

best_model = create_model(best, fold=30)
predict_model(best_model)

(No need to look up the best model to add manually to create_model(), or to use tune_model() until you got the model working.)

Solution 2:[2]

I found what the problem was: My target variables begin with value 1 and has 3 different values. This makes a error when the Pycaret tries to make a list comprehension (because it starts with the zero index). To solve that i just transformed my variable to begin with zero and worked fine

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 A. Beal
Solution 2 leandro minervino