'Remove records from dataframe that exist in another column but keeping some based on a specific priority with python
I want to plot on a map those names that do not have already a neighbour, so I need to remove from my dataframe all names that already have a known neighbour preserving those with the highest age.
df=pd.DataFrame(
list(zip(
['Isabel Garcia','Isabel Garcia','Raul Jimenez','Laura Gomez','Laura Gomez','Maria
Garcia','Paco Garcia','Isa Gomez','Lucas Gomez','Roberto Sanchez'],
[65,65,55,50,50,45,44,30,25,40],
['Maria Garcia','Paco Garcia','','Isa Gomez','Lucas Gomez','Isabel Garcia','Isabel
Garcia','Laura Gomez','Laura Gomez','']
)),
columns=['Name','Age','Neighbour'])
df
I want to get this as my final result:
pd.DataFrame(['Isabel Garcia','Raul Jimenez','Laura Gomez','Roberto Sanchez'], columns
['Name'])
I already tried to loop over the dataframe trying to append to a list all neighbours excluding those already looped over and want to preserve but all things I tried end up removing all values.
lsss=[]
lsss1=[]
for idx, row in df[~df['Name'].isin(lsss2)].iterrows():
lsss.append(row['Name'])#names looped over
#list of neighbours of those looped over
lsss1.append(df[df['Name']==row['Name']]['Neighbour'])
#flatten list and keep unique values
flat_list = list(dict.fromkeys([item for sublist in lsss1 for item in sublist]))
lsss2=flat_list#copy list of neighbours
for e in lsss:#remove from list those to keep
try:
lsss2.remove(e)
except ValueError:
pass
df[~df['Name'].isin(lsss2)]
Solution 1:[1]
I'm not sure if this is satisfying for you but here is a way to get your expected output.
known = set()
remaining_ppl = set(df['Name'])
for _, row in df.iterrows():
known.add(row['Name'])
if row['Neighbour'] in known:
remaining_ppl.discard(row['Name'])
print(remaining_ppl) #because remaining_ppl is a set, it is not ordered
Output:
{'Raul Jimenez', 'Isabel Garcia', 'Roberto Sanchez', 'Laura Gomez'}
If you want to have the filtered df you could do:
df.loc[df['Name'].isin(remaining_ppl), :]
Output:
Name Age Neighbour
0 Isabel Garcia 65 Maria Garcia
1 Isabel Garcia 65 Paco Garcia
2 Raul Jimenez 55
3 Laura Gomez 50 Isa Gomez
4 Laura Gomez 50 Lucas Gomez
9 Roberto Sanchez 40
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Rabinzel |


