'fetching substring with a condition from another df

I have 2 data sets, 1 with only address like this

import pandas as pd
import numpy as np

df = pd.DataFrame({"Address": ["36 omar st, pal, galambo","33 pom kopd malan", "15 kop st,dogg, ghog", "23 malo st, pal, kola"]})

                    Address
0  36 omar st, pal, galambo
1         33 pom kopd malan
2      15 kop st,dogg, ghog
3     23 malo st, pal, kola

and the other is a dataset with every state and the cities inside of it

df2 = pd.DataFrame({"State": ["galambo", "ghog", "ghog", "kola", "malan", "malan"], "City": ["pal", "dogg", "kopd", "kop", "pal", "kold"]})

     State  City
0  galambo   pal
1     ghog  dogg
2     ghog  kopd
3     kola   kop
4    malan   pal
5    malan  kold

I'm trying to fetch state name and city name out of each address, so I tried this

df["State"] = df['Address'].apply(lambda x: next((a for a in df2["State"].to_list() if a in x), np.nan))
df["City"] = df['Address'].apply(lambda x: next((a for a in df2["City"].to_list() if a in x), np.nan))

                    Address    State  City
0  36 omar st, pal, galambo  galambo   pal
1         33 pom kopd malan    malan  kopd
2      15 kop st,dogg, ghog     ghog  dogg
3     23 malo st, pal, kola     kola   pal

but as you see, the rows 1,3 are incorrect because according to df2 the State malan has no City called kopd, and State kola has no City called pal

so how can I make the output shows only the cities that are in the States as suggested in df2?

Update: Expected output

                    Address    State  City
0  36 omar st, pal, galambo  galambo   pal
1         33 pom kopd malan    malan  NaN
2      15 kop st,dogg, ghog     ghog  dogg
3     23 malo st, pal, kola     kola   NaN


Solution 1:[1]

You can extract the last matching state/city name, then perform a merge to replace the invalid cities by NaN:

# craft regexes
regex_state = f"({'|'.join(df2['State'].unique())})"
regex_city = f"({'|'.join(df2['City'].unique())})"

# extract state/city (last match)
df['State'] = df['Address'].str.findall(regex_state).str[-1]
df['City'] = df['Address'].str.findall(regex_city).str[-1]

# fix city
df['City'] = df.merge(df2.assign(c=df2['City']), on=['City', 'State'], how='left')['c']

Output:

                    Address    State  City
0  36 omar st, pal, galambo  galambo   pal
1         33 pom kopd malan    malan   NaN
2      15 kop st,dogg, ghog     ghog  dogg
3     23 malo st, pal, kola     kola   NaN

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 mozway