'How to reduce the number of columns after One-hot encoding

I am working with a dataset that requires converting a categorical column into a numeric equivalent as the dataset requires a couple of ML techniques to be implemented. I used one-hot encoding technique to convert the categorical column (i.e. Nationalities) into numeric columns suitable for machine learning models. However, this technique tends to return a total of 227 columns. Just wanted to know if there is a way to reduce the number of columns obtained after implementing OHE. thanks.

The image is attached Image.



Solution 1:[1]

You can use pd.factorize.

df['Nationalities_numeric'] = pd.factorize(df['Nationalities'])[0]
print(df)

# Output
  Nationalities  Nationalities_numeric
0        France                      0
1         Spain                      1
2        Italia                      2
3        France                      0
4        Italia                      2
5       Germany                      3

Instead of pd.get_dummies:

df = df.join(pd.get_dummies(df['Nationalities']))
print(df)

# Output
  Nationalities  France  Germany  Italia  Spain
0        France       1        0       0      0
1         Spain       0        0       0      1
2        Italia       0        0       1      0
3        France       1        0       0      0
4        Italia       0        0       1      0
5       Germany       0        1       0      0

Setup:

df = pd.DataFrame({'Nationalities': ['France', 'Spain', 'Italia',
                                     'France', 'Italia', 'Germany']})

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Corralien