'Subsetting pandas dataframe if column contains only ONE instance of string
I have the following data frame, I only want to grab rows where the summary column only contains ONE instance of '->'. How can I do this in pandas?
Input:
idx summary
0 McDonalds -> Wendys -> Popeyes
1 Popeyes -> Taco Bell
2 Carls Jr -> Arbys
3 Arbys -> Popeyes -> Taco Bell -> KFC
4 KFC -> Popeyes -> Boston Market
Expected Output:
idx summary
1 Popeyes -> Taco Bell
2 Carls Jr -> Arbys
Solution 1:[1]
str.count('->')==1 will grab the -> that occurs only once. Using the loc helps to identify which row it is located in, so the expected results will be the actual message, instead of True or False.
df_new = pd.DataFrame(df.loc[df["summary"].str.count('->')==1])
print(df_new)
Solution 2:[2]
If your input is saved as variable df, this would produce that result:
ct_arrow = df.apply(lambda x: x.summary.count('->'), axis=1)
df = df.loc[ct_arrow==1]
print(df)
Solution 3:[3]
You can do that with the following
df[df["summary"].str.count('->')==1]
Solution 4:[4]
As @mozway suggested, use your previous dataframe as starting point:
>>> df
first_stop second_stop third_stop
0 mcdonalds burger king popeyes
1 mcdonalds N/A N/A
2 wendys kfc N/A
3 taco bell kfc wendys
4 popeyes kfc panda express
>>> df[df.replace('N/A', pd.NA).count(axis=1).eq(2)]
first_stop second_stop third_stop
2 wendys kfc N/A
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | AlecZ |
| Solution 3 | BoomBoxBoy |
| Solution 4 | Corralien |
