'Get the label mappings from label encoder
I am using the following code to map a list of string labels to a list of one-hot-encoded values:
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
labelEncoder = LabelEncoder()
targets = ["blue","green","blue","blue","green"]
integerEncoded = labelEncoder.fit_transform(targets)
At a later stage I need to know exactly, which string labels are mapped to which integer values.
I.e. I need something like that:
integerMapping = GetIntegerMapping(labelEncoder)
Where
integerMapping["blue"]
should return the int value to which all "blue" labels are mapped
and
integerMapping["green"]
Should return the int value to which all "green" labels are mapped.
How can I get that integerMapping dictonary?
Solution 1:[1]
There is a classes_ attribute once the label encoder is fitted. The integer used to replace the label value is the index of the label in this array. So you can get the mapping with:
le = LabelEncoder()
le.fit(targets)
integer_mapping = {l: i for i, l in enumerate(le.classes_)}
Solution 2:[2]
you can make a dictionary which maps the target and encoded integer
integerMapping=dict(zip(targets,integerEncoded))
Solution 3:[3]
Here is a simple answer:
# helper function to get the mapping between original label and encoded label
def get_label_map(df:pd.DataFrame, label:str):
"""get the mapping between original label and its encoded value
df: a pandas dataframe with both feature variables and target variable
label: the name of target variable
Example:
df0 = pd.DataFrame({'fea1':[1,2,3,4], 'fea2':['a','b','b','c'], 'target':['cat', 'cat','dog','cat']})
label = 'target'
label_map = get_label_map(df=df0, label='target')
"""
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder() # init label encoder
y_le = le.fit_transform(df[[label]]) # encode target variable
label_map = dict(zip(df[label], y_le)) # get the mapping between the original labels and encoded labels
return label_map
Example:
df0 = pd.DataFrame({'fea1':[1,2,3,4], 'fea2':['a','b','b','c'], 'target':['cat', 'cat','dog','monkey']})
label_map = get_label_map(df=df0, label='target') # {'cat': 0, 'dog': 1, 'monkey': 2}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Alexis M |
| Solution 2 | |
| Solution 3 |
