'OneHotEncoder fit error ValueError: could not convert string to float: b
from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder(dtype=np.str)
enc.fit([['b', 'c', 'd'], ['a', 'd', 'f']])
print enc.transform([['a', 'd', 'f']]).toarray()
ValueError: could not convert string to float: b
sklearn.version = 0.19.2
It can't work too if dtype = np.int64
Solution 1:[1]
dtype in OneHotEncoder is used for desired output and only number type is supported and you are passing np.str thats why you are getting error
import numpy as np
from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder(dtype = 'int64') #by default dtype is float
enc.fit([['b', 'c', 'd'], ['a', 'd', 'f']])
print(enc.transform([['a', 'd', 'f']]).toarray())
#op
[[1 0 0 1 0 1]]
Solution 2:[2]
Try
enc = enc.fit([['b', 'c', 'd'], ['a', 'd', 'f']])
enc.data #--> array(['1.0', '1.0', '1.0'], dtype='<U32')
enc.data.tolist()
Solution 3:[3]
OneHotEncoder expects your input data to be in the numerical format before converting them to dummy variables. Thus, you might want to fit_transform using LabelEncoder to convert your categorical data to numbers first.
Another neater way to do this is using LabelBinarizer, which works like using LabelEncoder and then OneHotEncoder:
from sklearn.preprocessing import LabelBinarizer
encoder = LabelBinarizer()
encoder = encoder.fit(['a', 'd', 'f'])
print(encoder.transform(['a','d','f']))
Solution 4:[4]
I get correct output using sklearn 0.21, it should be problem for sklearn's version, thanks all !
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | qaiser |
| Solution 2 | oppressionslayer |
| Solution 3 | Eric Cartman |
| Solution 4 | wa007 |
