'Housing Machine Learning Error: "Supported target types are: ('binary', 'multiclass'). Got 'multilabel-indicator' instead"

So I'm trying to make a machine learning algorithm that tests the cross_val_score of some housing data to determine which algorithm is the most accurate at determining housing value. I'm using an outline displayed in a previous project I did to predict the species of iris, although this dataset is much larger with much more categories to take into consideration (this is 506x14, the last was 150x4).

I was expecting X to be the array of all of the values not including the final column which is the house median value, Y. Did a simple splitter and originally attempted to pass those values immediately through a cross_val_score. However, I got an error that the function only took binary or multiclass and it was receiving continuous. An answer on Stack said to use keras.utils.to_categorical to make the data binary so I attempted that with the values. It threw the error Supported target types are: ('binary', 'multiclass'). Got 'multilabel-indicator' instead The only solution I found was to put to_categorical after StratifiedKFold but that hasn't seemed to fix the error.

# Load dataset
url = "https://raw.githubusercontent.com/jbrownlee/Datasets/master/housing.csv"
names = ['CRIM', 'ZN', 'INDUS', 'CHAS', 'NOX', 'RM', 'AGE', 'DIS', 'RAD', 'TAX', 'PTRATIO', 'B', 'LSTAT', 'MEDV']
dataset = read_csv(url, names=names)
# Split-out validation dataset
array = dataset.values
X = array[:, 0:13]
y = array[:, 13]
X_train, X_validation, Y_train, Y_validation, = train_test_split(X, y, test_size=0.20, random_state=1, shuffle=True)
# Spot check Algorithms
models = []
models.append(('LR', LogisticRegression(solver='liblinear', multi_class='ovr')))
models.append(('LDA', LinearDiscriminantAnalysis()))
models.append(('KNN', KNeighborsClassifier()))
models.append(('CART', DecisionTreeClassifier()))
models.append(('NB', GaussianNB()))
models.append(('SVM', SVC(gamma='auto')))
# evaluate each model in turn
results = []
names = []
for name, model in models:
    kfold = StratifiedKFold(n_splits=10, random_state=1, shuffle=True)
    # convert class vectors to binary class matrices
    X_train = keras.utils.to_categorical(X_train, 0)
    X_validation = keras.utils.to_categorical(X_validation, 0)
    Y_train = keras.utils.to_categorical(Y_train, 0)
    Y_validation = keras.utils.to_categorical(Y_validation, 0)
    cv_results = cross_val_score(model, X_train, Y_train, cv=kfold, scoring='accuracy')
    results.append(cv_results)
    names.append(name)
    print('%s: %f (%f)' % (name, cv_results.mean(), cv_results.std()))
# Compare Algorithms
pyplot.boxplot(results, labels=names)
pyplot.title('Algorithm Comparisons')
pyplot.show()

Any help would be greatly appreciated in order to figure out why my data isn't being passed through the scorer correctly.



Solution 1:[1]

First of all, I must say that the classification and regression are different problems in machine learning which you can know more about them here.

Now you are solving a regression problem (Housing) using a solution developed for a classification problem (Iris).

You have two options

  1. Solve the problem as Regression (some of the models you used may not have a regression version)
  2. Convert your problem to classification (which has been asked here!)

This is my implementation of the first solution, which was achieved with minor changes to your code ;)

models = []
models.append(('LR', LinearRegression()))
# models.append(('LDA', LinearDiscriminantAnalysis()))
models.append(('KNN', KNeighborsRegressor()))
models.append(('CART', DecisionTreeRegressor()))
models.append(('NB', GaussianProcessRegressor()))
models.append(('SVM', SVR(gamma='auto')))
# evaluate each model in turn
results = []
names = []
for name, model in models:
    regr = model
    cv_results = cross_val_score(regr, X, y, cv=5)
    results.append(cv_results)
    names.append(name)
    print('%s: %f (%f)' % (name, cv_results.mean(), cv_results.std()))
# Compare Algorithms
pyplot.boxplot(results, labels=names)
pyplot.title('Algorithm Comparisons')
pyplot.show()

Finally, the train_test_split and cross_val_score are two alternative methods for evaluating the quality of the model. The simultaneous use of both is not recommended!

Solution 2:[2]

for regression, you have to use KFold instead of StratifiedKFold. So it would be,

kfold = KFold(n_splits=10, random_state=1, shuffle=True)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 meti
Solution 2 Binata Roy