'Remove records from dataframe that exist in another column but keeping some based on a specific priority with python

I want to plot on a map those names that do not have already a neighbour, so I need to remove from my dataframe all names that already have a known neighbour preserving those with the highest age.

df=pd.DataFrame(
list(zip(
['Isabel Garcia','Isabel Garcia','Raul Jimenez','Laura Gomez','Laura Gomez','Maria 
Garcia','Paco Garcia','Isa Gomez','Lucas Gomez','Roberto Sanchez'],
[65,65,55,50,50,45,44,30,25,40],
['Maria Garcia','Paco Garcia','','Isa Gomez','Lucas Gomez','Isabel Garcia','Isabel 
Garcia','Laura Gomez','Laura Gomez','']
)),
columns=['Name','Age','Neighbour'])
df

I want to get this as my final result:

pd.DataFrame(['Isabel Garcia','Raul Jimenez','Laura Gomez','Roberto Sanchez'], columns 
['Name'])

I already tried to loop over the dataframe trying to append to a list all neighbours excluding those already looped over and want to preserve but all things I tried end up removing all values.

lsss=[]
lsss1=[]


for idx, row in df[~df['Name'].isin(lsss2)].iterrows():
   lsss.append(row['Name'])#names looped over
   #list of neighbours of those looped over
   lsss1.append(df[df['Name']==row['Name']]['Neighbour'])
   #flatten list and keep unique values
   flat_list = list(dict.fromkeys([item for sublist in lsss1 for item in sublist]))
   lsss2=flat_list#copy list of neighbours
   for e in lsss:#remove from list those to keep
      try:
        lsss2.remove(e)
      except ValueError:
        pass

   df[~df['Name'].isin(lsss2)]

python pandas

Solution 1:^[1]

I'm not sure if this is satisfying for you but here is a way to get your expected output.

known = set()
remaining_ppl = set(df['Name'])
for _, row in df.iterrows():
    known.add(row['Name'])
    if row['Neighbour'] in known:
        remaining_ppl.discard(row['Name'])

print(remaining_ppl) #because remaining_ppl is a set, it is not ordered

Output:
{'Raul Jimenez', 'Isabel Garcia', 'Roberto Sanchez', 'Laura Gomez'}

If you want to have the filtered df you could do:

df.loc[df['Name'].isin(remaining_ppl), :]

Output:
    Name            Age Neighbour
0   Isabel Garcia   65  Maria Garcia
1   Isabel Garcia   65  Paco Garcia
2   Raul Jimenez    55  
3   Laura Gomez     50  Isa Gomez
4   Laura Gomez     50  Lucas Gomez
9   Roberto Sanchez 40

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Rabinzel

'Remove records from dataframe that exist in another column but keeping some based on a specific priority with python

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]