'How to encode pandas data frame column with three values fast?
I have a pandas data frame that contains a column called Country. I have more than a million rows in my data frame.
Country
USA
Canada
Japan
India
Brazil
......
I want to create a new column called Country_Encode, which will replace USA with 1, Canada with 2, and all others with 0 like the following.
Country Country_Encode
USA 1
Canada 2
Japan 0
India 0
Brazil 0
..................
I have tried following.
for idx, row in df.iterrows():
if (df.loc[idx, 'Country'] == USA):
df.loc[idx, 'Country_Encode'] = 1
elif (df.loc[idx, 'Country'] == Canada):
df.loc[idx, 'Country_Encode'] = 2
elif ((df.loc[idx, 'Country'] != USA) and (df.loc[idx, 'Country'] != Canada)):
df.loc[idx, 'Country_Encode'] = 0
The above solution works but it is very slow. Do you know how I can do it in a fast way? I really appreciate any help you can provide.
Solution 1:[1]
Assuming no row contains two country names, you could assign values in a vectorized way using a boolean condition:
df['Country_encode'] = df['Country'].eq('USA') + df['Country'].eq('Canada')*2
Output:
Country Country_encode
0 USA 1
1 Canada 2
2 Japan 0
3 India 0
4 Brazil 0
But in general, loc is very fast:
df['Country_encode'] = 0
df.loc[df['Country'].eq('USA'), 'Country_encode'] = 1
df.loc[df['Country'].eq('Canada'), 'Country_encode'] = 2
Solution 2:[2]
There are many ways to do this, the most basic one is the following:
def coding(row):
if row == "USA":
return 1
elif row== "Canada":
return 2
else:
return 0
df["Country_code"] = df["Country"].apply(coding)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Omar |
