'How to select values from pandas dataframe by column value

I am doing an analysis of a dataset with 6 classes, zero based. The dataset is many thousands of items long.

I need two dataframes with classes 0 & 1 for the first data set and 3 & 5 for the second.

I can get 0 & 1 together easily enough:

mnist_01 = mnist.loc[mnist['class']<= 1]

However, I am not sure how to get classes 3 & 5... so what I would like to be able to do is:

mnist_35 = mnist.loc[mnist['class'] == (3 or 5)]

...rather than doing:

mnist_3 = mnist.loc[mnist['class'] == 3]
mnist_5 = mnist.loc[mnist['class'] == 5]
mnist_35 = pd.concat([mnist_3,mnist_5],axis=0)


Solution 1:[1]

You can use isin, probably using set membership to make each check an O(1) time complexity operation:

mnist = pd.DataFrame({'class': [0, 1, 2, 3, 4, 5], 
                      'val': ['a', 'b', 'c', 'd', 'e', 'f']})

>>> mnist.loc[mnist['class'].isin({3, 5})]
   class val
3      3   d
5      5   f

>>> mnist.loc[mnist['class'].isin({0, 1})]
   class val
0      0   a
1      1   b

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1