'Using isin() doesn't work when trying to remove a dataframe's coulmn items that are not found in a list

I'm trying to remove dataframe rows that are not found in a list, but it's not working for some reason. I couldn't find similar past issues that might help.

The code:

with open('file.txt', 'r') as file_:
    tab_delimited = [line.split() for line in file_.readlines()]

df= p.DataFrame(tab_delimited[1:], columns=tab_delimited[0]).fillna(value='')
df.gene_ID = df.gene_ID.astype('string')
df = df[~df['gene_ID'].isin(a_list)]

I added astype('string') in order to convert the column data to string in case it's not, but got the same result.

When I'm checking if a value that's in the column is in the column as follows:

a_list[0] is in df['gene_ID']

I get False. Only when converting to a list I get True:

a_list[0] is in df['gene_ID'].to_list

I'd love to get some ideas on what could be the issue.

Here's a sample of the input file:

gene_ID                  start_coord       end_coord     average_cov          #reads            RPKM

BAB_RS10420                      634            2274            99.8            2521             186
BAB_RS10425                     2578            3696           133.3            2295             249
BAB_RS10435                     3878            5032            16.4             291              31
BAB_RS15955                     5070            5852           240.7            2899             449
BAB_RS15960                     5869            6825           176.0            2591             328

and a sample of the list a_list:

['BAB_RS25775', 'BAB_RS25755', 'BAB_RS10425', 'BAB_RS25745', 'BAB_RS25735', 'BAB_RS25730', 'BAB_RS25725', 'BAB_RS10420', 'BAB_RS25710', 'BAB_RS25700', 'BAB_RS25695', 'BAB_RS25690', 'BAB_RS25675', 'BAB_RS25665', 'BAB_RS25660', 'BAB_RS25655', 'BAB_RS25650', 'BAB_RS25645', 'BAB_RS25640', 'BAB_RS25630', 'BAB_RS15960', 'BAB_RS25620', 'BAB_RS25610', 'BAB_RS25605', 'BAB_RS25600', 'BAB_RS25595', 'BAB_RS25585', 'BAB_RS25575', 'BAB_RS25560', 'BAB_RS25555', 'BAB_RS25550', 'BAB_RS25545', 'BAB_RS25535', 'BAB_RS25525', 'BAB_RS25510', 'BAB_RS25500', 'BAB_RS25465', 'BAB_RS25450', 'BAB_RS25410', 'BAB_RS25405', 'BAB_RS25400', 'BAB_RS25390', 'BAB_RS25385', 'BAB_RS25370', 'BAB_RS25355', 'BAB_RS25345', 'BAB_RS25340', 'BAB_RS25335', 'BAB_RS25325', 'BAB_RS25300']


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source