'Creating a new Dataframe based on rows with certain values and removing the rows from the original Dataframe

I try to separate a Dataframe based on rows with a certain value in multiple columns, so that the original Dataframe is split in two with all rows containing the value in one Dataframe and the other Dataframe with the residual rows.

df = pd.DataFrame(np.random.randint(-1,100,size=(100, 4)), columns=list('ABCD'))

df
    A   B   C   D
0   51  86  15  80
1   61  53  75  66
2   80  48  23  58
3   86  25  37  99
4   50  11  87  71
... ... ... ... ...
95  34  40  43  40
96  89  16  83  72
97  97  32  24  26
98  27  83  75  29
99  24  50  40  43
100 rows × 4 columns

df[~df.isin([-1])].dropna()
    A   B   C   D
0   51  86  15  80.0
1   61  53  75  66.0
2   80  48  23  58.0
3   86  25  37  99.0
4   50  11  87  71.0
... ... ... ... ...
95  34  40  43  40.0
96  89  16  83  72.0
97  97  32  24  26.0
98  27  83  75  29.0
99  24  50  40  43.0
98 rows × 4 columns

df[df.isin([-1])].dropna()
A   B   C   D

is what i tried so far and the first part worked correctly. However df[df.isin([-1])].dropna() failed.



Solution 1:[1]

Your code is almost correct. Use any(axis=1) to keep only one boolean value for each row instead of using dropna(how='all')

The same with a reproducible example:

import pandas as pd
import numpy as np

np.random.seed(2022)
vals = np.random.choice([-1, 0, 1], size=(10, 4), p=[.2, .4, .4])
df = pd.DataFrame(vals, columns=list('ABCD'))

m = df.isin([-1]).any(axis=1)  # or df.eq(-1).any(axis=1)
df1, df2 = df[m], df[~m]

Output:

>>> df.assign(M=m)
   A  B  C  D      M
0 -1  0 -1 -1   True
1  1  0  1  1  False
2  1  1  1  1  False
3  1  1  0  0  False
4  0  1  1 -1   True
5  1  0  0  1  False
6 -1  0  1  0   True
7  0  0  0  0  False
8  1 -1  1  0   True
9  1  1  0  1  False

>>> df1
   A  B  C  D
0 -1  0 -1 -1
4  0  1  1 -1
6 -1  0  1  0
8  1 -1  1  0

>>> df2
   A  B  C  D
1  1  0  1  1
2  1  1  1  1
3  1  1  0  0
5  1  0  0  1
7  0  0  0  0
9  1  1  0  1

Solution 2:[2]

Lets assume you would like to filter data by value equals to 80.

Possible solution is the following:

# pip install pandas

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(-1,100,size=(100, 4)), columns=list('ABCD'))
df

enter image description here

# df1 = df[~df.isin([80])].dropna().reset_index(drop=True)
# or
df1 = df[~df.eq(80).any(1)].reset_index(drop=True)
df1

enter image description here

df2 = df[df.eq(80).any(1)].reset_index(drop=True)
df2

enter image description here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Corralien
Solution 2