'Replace substrings from a Dataframe column which correspond to values of another dataframe column with values of a third column
I have this huge dataset in which I have to replace each country's name with the corresponding ISO code. I have stored the ISO code of each country into another df. e.g.
df1:
| TERRITORY |
|---|
| France, Italy |
| Italy |
| Spain, France |
| France, Spain, Italy |
df2:
| COUNTRY | ISO CODE |
|---|---|
| France | FR |
| Italy | IT |
| Spain | ES |
Expected output:
| TERRITORY |
|---|
| FR, IT |
| IT |
| ES, FR |
| FR, ES, IT |
My last try was to convert the element into tuples and then replace, but it doesn't work (and I don't know if it makes sense, but here's my code anyway):
country = tuple(list(df2['COUNTRY']))
iso = tuple(list(df2['ISO CODE']))
z = zip(x, y)
for x, y in z:
if x in df1['TERRITORY']:
df1['TERRITORY'].str.replace(x, y)
But it doesn't change anything.
Solution 1:[1]
You can use a combination of .str.split + .explode, then .replace + .set_index, and finally .groupby(level=0) + agg(list) + .str.join:
df1['TERRITORY'] = df1['TERRITORY'].str.split(', ').explode().replace(df2.set_index('COUNTRY')['ISO CODE']).groupby(level=0).agg(list).str.join(', ')
Output:
>>> df1
TERRITORY
0 FR, IT
1 IT
2 ES, FR
3 FR, ES, IT
Solution 2:[2]
If you have all the mapping you can also just use replace():
df1.replace(df2.set_index('COUNTRY').squeeze(), regex=True)
Output:
Territory
0 FR, IT
1 IT
2 ES, FR
3 FR, ES, IT
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | richardec |
