'Selecting a subset of a dataframe based on a list - pandas

I am working with a large dataframe (ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/assembly_summary.txt) with pandas in Python 3, using PyCharm. The column that I'm interested in for now is called 'taxid'. I want to go through the dataframe and keep only the rows in which the 'taxid' value can be found in another list that I am giving the program. Not all of the items in the list will be in the dataframe.

My current issue: I have tried using a small, sample list ( nodes_list=['1707289', '251229','14'] ) and it does not work. From my tries at debugging, I can tell that this appears to be somehow related to apostrophes.

print(data[data['taxid'].isin([1707289, 251229, 14])]) gives me the expected values.

However,

print(data[data['taxid'].isin(['1707289', '251229', '14'])]) or print(data[data['taxid'].isin(nodes_list)]) give me an empty dataframe, with an empty index.

I do not know how to fix this and would appreciate your help!



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source