'Search values in a Pandas DataFrame with values from another DataFrame

I have 2 dataframes.

df_dora

	content	feature	id
1	cyber hygien	risk management	1
2	cyber risk	risk management	2
...	...	... ...
59	intellig share	information sharing	63
60	inform share	information sharing	64

df_corpus

	content	id	meta.name	meta._split_id
0	market grow cyber attack...	56a2a2e28954537131a4aa734f49e361	14_Group_AG_2021	0
1	sec form file index	7aedfd4df02687d3dff9897c925da508	14_Group_AG_2021	1
	...	...	...	...
213769	cyber secur alert parent compani fina...	ab10325601597f203f3f0af7aa647112	17_La_Banque_2021	8581
213770	intellig share statement parent compani fina...	6af5687ac31849d19d2048e0b2ca472d	17_La_Banque_2021	8582

I am trying to extract a count of each term listed in df_dora.content within df_corpus.content grouped by df_content.meta.name. I tried to use isin

df = df_corpus[df_corpus.content.isin(df_dora.content)]

len(df)

Returns only 17 rows

	content	id	meta.name	meta
41474	incid	a4c478e0fad1b9775c05e01d871b3aaf	3_Agricole_2021	10185
68690	oper risk	2e5139d82c242c89523110cc1110647a	10_Banking_Group_PLC_2021	5525
...	...	...	...	...
99259	risk report	a84eefb9a4772d13eb67f2d6ae5215cb	31_Building_Society_2021	4820
105662	risk manag	e8050be841fedb6dd10599e8b4892a9f	43_Bank_SA_2021	131

df_corpus.loc[df_corpus.content.isin(df_dora.content), 'content'].tolist()

also returns 17 rows

if I search for 2 of the terms that exist in df_dora directly in df_corpus resiliency_term = df_corpus.loc[df_corpus['content'].str.contains("cyber risk|inform share", case=False)]

print(resiliency_term)

I get 243 rows (which matches what was in the original file.)

So given the above...my question is this how do I extract a count of each term listed in df_dora.content within df_corpus.content grouped by df_content.meta.name.

Thanks in advance for any help.

pandas dataframe

Solution 1:^[1]

unique_vals = '|'.join(df_dora.content.unique())
df_corpus.groupby('meta.name').apply(lambda x: x.content.str.findall(unique_vals).explode().value_counts())

Output given your four lines of each:

17_La_Banque_2021  intellig share    1
Name: content, dtype: int64

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	BeRT2me

'Search values in a Pandas DataFrame with values from another DataFrame

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]