'Python ML Model with ExtraTreesClassifier

Does anybody can see what I'm doing wrong?

Input:

from sklearn.ensemble import ExtraTreesClassifier
modelo = ExtraTreesClassifier()
modelo.fit(x_treino,y_treino)

resultado = modelo.score(x_teste, y_teste)
print("Acurácia", resultado)

Output: i'm having this errors

D:\Anaconda\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
     61             extra_args = len(args) - len(all_args)
     62             if extra_args <= 0:
---> 63                 return f(*args, **kwargs)
     64 
     65             # extra_args > 0

D:\Anaconda\lib\site-packages\sklearn\utils\validation.py in check_X_y(X, y, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric, estimator)
    869         raise ValueError("y cannot be None")
    870 
--> 871     X = check_array(X, accept_sparse=accept_sparse,
    872                     accept_large_sparse=accept_large_sparse,
    873                     dtype=dtype, order=order, copy=copy,

D:\Anaconda\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
     61             extra_args = len(args) - len(all_args)
     62             if extra_args <= 0:
---> 63                 return f(*args, **kwargs)
     64 
     65             # extra_args > 0

D:\Anaconda\lib\site-packages\sklearn\utils\validation.py in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator)
    671                     array = array.astype(dtype, casting="unsafe", copy=False)
    672                 else:
--> 673                     array = np.asarray(array, order=order, dtype=dtype)
    674             except ComplexWarning as complex_warning:
    675                 raise ValueError("Complex data not supported\n"

D:\Anaconda\lib\site-packages\numpy\core\_asarray.py in asarray(a, dtype, order, like)
    100         return _asarray_with_like(a, dtype=dtype, order=order, like=like)
    101 
--> 102     return array(a, dtype, copy=False, order=order)
    103 
    104 

D:\Anaconda\lib\site-packages\pandas\core\generic.py in __array__(self, dtype)
   1991 
   1992     def __array__(self, dtype: NpDtype | None = None) -> np.ndarray:
-> 1993         return np.asarray(self._values, dtype=dtype)
   1994 
   1995     def __array_wrap__(

D:\Anaconda\lib\site-packages\numpy\core\_asarray.py in asarray(a, dtype, order, like)
    100         return _asarray_with_like(a, dtype=dtype, order=order, like=like)
    101 
--> 102     return array(a, dtype, copy=False, order=order)
    103 
    104 

ValueError: could not convert string to float: 'M'

MACHINE LEARNING MODEL USIN PYTHON WITH THE FUNCTION ExtraTreesClassifier now I don't have anything else to share but sof is making me write more to post this so ignore this texte here pls.



Solution 1:[1]

Ok so from your comment I can see that the data has only two numerical variables (Age, Na to K), while the rest are categorical (Sex, Blood Pressure, Cholesterol, Drug). ExtraTreesClassifier can only work with numerical variables hence the ValueError. To combat this you should first preprocess the data so that they are transformed to numeric and especially float so that they are able to be handled by the classifier.

Some preprocessing algorithms for your case would be LabelEncoder, OrdinalEncoder or maybe OneHotEncoder, depending on the data and your methods.

Here's the full documentation page (you're looking for categorical to numeric transformations):
https://scikit-learn.org/stable/modules/classes.html#module-sklearn.preprocessing

If your question was answered please consider marking it as solved.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Mario