'fetching substring with a condition from another df
I have 2 data sets, 1 with only address like this
import pandas as pd
import numpy as np
df = pd.DataFrame({"Address": ["36 omar st, pal, galambo","33 pom kopd malan", "15 kop st,dogg, ghog", "23 malo st, pal, kola"]})
Address
0 36 omar st, pal, galambo
1 33 pom kopd malan
2 15 kop st,dogg, ghog
3 23 malo st, pal, kola
and the other is a dataset with every state and the cities inside of it
df2 = pd.DataFrame({"State": ["galambo", "ghog", "ghog", "kola", "malan", "malan"], "City": ["pal", "dogg", "kopd", "kop", "pal", "kold"]})
State City
0 galambo pal
1 ghog dogg
2 ghog kopd
3 kola kop
4 malan pal
5 malan kold
I'm trying to fetch state name and city name out of each address, so I tried this
df["State"] = df['Address'].apply(lambda x: next((a for a in df2["State"].to_list() if a in x), np.nan))
df["City"] = df['Address'].apply(lambda x: next((a for a in df2["City"].to_list() if a in x), np.nan))
Address State City
0 36 omar st, pal, galambo galambo pal
1 33 pom kopd malan malan kopd
2 15 kop st,dogg, ghog ghog dogg
3 23 malo st, pal, kola kola pal
but as you see, the rows 1,3 are incorrect because according to df2 the State malan has no City called kopd, and State kola has no City called pal
so how can I make the output shows only the cities that are in the States as suggested in df2?
Update: Expected output
Address State City
0 36 omar st, pal, galambo galambo pal
1 33 pom kopd malan malan NaN
2 15 kop st,dogg, ghog ghog dogg
3 23 malo st, pal, kola kola NaN
Solution 1:[1]
You can extract the last matching state/city name, then perform a merge to replace the invalid cities by NaN:
# craft regexes
regex_state = f"({'|'.join(df2['State'].unique())})"
regex_city = f"({'|'.join(df2['City'].unique())})"
# extract state/city (last match)
df['State'] = df['Address'].str.findall(regex_state).str[-1]
df['City'] = df['Address'].str.findall(regex_city).str[-1]
# fix city
df['City'] = df.merge(df2.assign(c=df2['City']), on=['City', 'State'], how='left')['c']
Output:
Address State City
0 36 omar st, pal, galambo galambo pal
1 33 pom kopd malan malan NaN
2 15 kop st,dogg, ghog ghog dogg
3 23 malo st, pal, kola kola NaN
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | mozway |
