'Selecting a subset of a dataframe based on a list - pandas
I am working with a large dataframe (ftp://ftp.ncbi.nlm.nih.gov/genomes/refseq/bacteria/assembly_summary.txt) with pandas in Python 3, using PyCharm. The column that I'm interested in for now is called 'taxid'. I want to go through the dataframe and keep only the rows in which the 'taxid' value can be found in another list that I am giving the program. Not all of the items in the list will be in the dataframe.
My current issue:
I have tried using a small, sample list ( nodes_list=['1707289', '251229','14']
) and it does not work. From my tries at debugging, I can tell that this appears to be somehow related to apostrophes.
print(data[data['taxid'].isin([1707289, 251229, 14])])
gives me the expected values.
However,
print(data[data['taxid'].isin(['1707289', '251229', '14'])])
or print(data[data['taxid'].isin(nodes_list)])
give me an empty dataframe, with an empty index.
I do not know how to fix this and would appreciate your help!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|