'Creating a new Dataframe based on rows with certain values and removing the rows from the original Dataframe
I try to separate a Dataframe based on rows with a certain value in multiple columns, so that the original Dataframe is split in two with all rows containing the value in one Dataframe and the other Dataframe with the residual rows.
df = pd.DataFrame(np.random.randint(-1,100,size=(100, 4)), columns=list('ABCD'))
df
A B C D
0 51 86 15 80
1 61 53 75 66
2 80 48 23 58
3 86 25 37 99
4 50 11 87 71
... ... ... ... ...
95 34 40 43 40
96 89 16 83 72
97 97 32 24 26
98 27 83 75 29
99 24 50 40 43
100 rows × 4 columns
df[~df.isin([-1])].dropna()
A B C D
0 51 86 15 80.0
1 61 53 75 66.0
2 80 48 23 58.0
3 86 25 37 99.0
4 50 11 87 71.0
... ... ... ... ...
95 34 40 43 40.0
96 89 16 83 72.0
97 97 32 24 26.0
98 27 83 75 29.0
99 24 50 40 43.0
98 rows × 4 columns
df[df.isin([-1])].dropna()
A B C D
is what i tried so far and the first part worked correctly. However df[df.isin([-1])].dropna() failed.
Solution 1:[1]
Your code is almost correct. Use any(axis=1) to keep only one boolean value for each row instead of using dropna(how='all')
The same with a reproducible example:
import pandas as pd
import numpy as np
np.random.seed(2022)
vals = np.random.choice([-1, 0, 1], size=(10, 4), p=[.2, .4, .4])
df = pd.DataFrame(vals, columns=list('ABCD'))
m = df.isin([-1]).any(axis=1) # or df.eq(-1).any(axis=1)
df1, df2 = df[m], df[~m]
Output:
>>> df.assign(M=m)
A B C D M
0 -1 0 -1 -1 True
1 1 0 1 1 False
2 1 1 1 1 False
3 1 1 0 0 False
4 0 1 1 -1 True
5 1 0 0 1 False
6 -1 0 1 0 True
7 0 0 0 0 False
8 1 -1 1 0 True
9 1 1 0 1 False
>>> df1
A B C D
0 -1 0 -1 -1
4 0 1 1 -1
6 -1 0 1 0
8 1 -1 1 0
>>> df2
A B C D
1 1 0 1 1
2 1 1 1 1
3 1 1 0 0
5 1 0 0 1
7 0 0 0 0
9 1 1 0 1
Solution 2:[2]
Lets assume you would like to filter data by value equals to 80.
Possible solution is the following:
# pip install pandas
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(-1,100,size=(100, 4)), columns=list('ABCD'))
df
# df1 = df[~df.isin([80])].dropna().reset_index(drop=True)
# or
df1 = df[~df.eq(80).any(1)].reset_index(drop=True)
df1
df2 = df[df.eq(80).any(1)].reset_index(drop=True)
df2
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Corralien |
| Solution 2 |



