'OneHotEncoder fit error ValueError: could not convert string to float: b

from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder(dtype=np.str)
enc.fit([['b', 'c', 'd'], ['a', 'd', 'f']])
print enc.transform([['a', 'd', 'f']]).toarray()

ValueError: could not convert string to float: b

sklearn.version = 0.19.2

It can't work too if dtype = np.int64



Solution 1:[1]

dtype in OneHotEncoder is used for desired output and only number type is supported and you are passing np.str thats why you are getting error

import numpy as np
from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder(dtype = 'int64') #by default dtype is float 
enc.fit([['b', 'c', 'd'], ['a', 'd', 'f']])
print(enc.transform([['a', 'd', 'f']]).toarray())
#op
[[1 0 0 1 0 1]]

Solution 2:[2]

Try

enc = enc.fit([['b', 'c', 'd'], ['a', 'd', 'f']])
enc.data #--> array(['1.0', '1.0', '1.0'], dtype='<U32')
enc.data.tolist()

Solution 3:[3]

OneHotEncoder expects your input data to be in the numerical format before converting them to dummy variables. Thus, you might want to fit_transform using LabelEncoder to convert your categorical data to numbers first.

Another neater way to do this is using LabelBinarizer, which works like using LabelEncoder and then OneHotEncoder:

from sklearn.preprocessing import LabelBinarizer
encoder = LabelBinarizer()
encoder = encoder.fit(['a', 'd', 'f'])
print(encoder.transform(['a','d','f']))

Solution 4:[4]

I get correct output using sklearn 0.21, it should be problem for sklearn's version, thanks all !

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 qaiser
Solution 2 oppressionslayer
Solution 3 Eric Cartman
Solution 4 wa007