'Using isin() doesn't work when trying to remove a dataframe's coulmn items that are not found in a list
I'm trying to remove dataframe rows that are not found in a list, but it's not working for some reason. I couldn't find similar past issues that might help.
The code:
with open('file.txt', 'r') as file_:
tab_delimited = [line.split() for line in file_.readlines()]
df= p.DataFrame(tab_delimited[1:], columns=tab_delimited[0]).fillna(value='')
df.gene_ID = df.gene_ID.astype('string')
df = df[~df['gene_ID'].isin(a_list)]
I added astype('string') in order to convert the column data to string in case it's not, but got the same result.
When I'm checking if a value that's in the column is in the column as follows:
a_list[0] is in df['gene_ID']
I get False. Only when converting to a list I get True:
a_list[0] is in df['gene_ID'].to_list
I'd love to get some ideas on what could be the issue.
Here's a sample of the input file:
gene_ID start_coord end_coord average_cov #reads RPKM
BAB_RS10420 634 2274 99.8 2521 186
BAB_RS10425 2578 3696 133.3 2295 249
BAB_RS10435 3878 5032 16.4 291 31
BAB_RS15955 5070 5852 240.7 2899 449
BAB_RS15960 5869 6825 176.0 2591 328
and a sample of the list a_list:
['BAB_RS25775', 'BAB_RS25755', 'BAB_RS10425', 'BAB_RS25745', 'BAB_RS25735', 'BAB_RS25730', 'BAB_RS25725', 'BAB_RS10420', 'BAB_RS25710', 'BAB_RS25700', 'BAB_RS25695', 'BAB_RS25690', 'BAB_RS25675', 'BAB_RS25665', 'BAB_RS25660', 'BAB_RS25655', 'BAB_RS25650', 'BAB_RS25645', 'BAB_RS25640', 'BAB_RS25630', 'BAB_RS15960', 'BAB_RS25620', 'BAB_RS25610', 'BAB_RS25605', 'BAB_RS25600', 'BAB_RS25595', 'BAB_RS25585', 'BAB_RS25575', 'BAB_RS25560', 'BAB_RS25555', 'BAB_RS25550', 'BAB_RS25545', 'BAB_RS25535', 'BAB_RS25525', 'BAB_RS25510', 'BAB_RS25500', 'BAB_RS25465', 'BAB_RS25450', 'BAB_RS25410', 'BAB_RS25405', 'BAB_RS25400', 'BAB_RS25390', 'BAB_RS25385', 'BAB_RS25370', 'BAB_RS25355', 'BAB_RS25345', 'BAB_RS25340', 'BAB_RS25335', 'BAB_RS25325', 'BAB_RS25300']
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
