'Find first non-zero value in each column of pandas DataFrame
What is a pandoric way to get a value and index of the first non-zero element in each column of a DataFrame (top to bottom)?
import pandas as pd
df = pd.DataFrame([[0, 0, 0],
[0, 10, 0],
[4, 0, 0],
[1, 2, 3]],
columns=['first', 'second', 'third'])
print(df.head())
# first second third
# 0 0 0 0
# 1 0 10 0
# 2 4 0 0
# 3 1 2 3
What I would like to achieve:
# value pos
# first 4 2
# second 10 1
# third 1 3
Solution 1:[1]
Here's the longwinded way, which should be faster if your non-zero values tend to occur near the start of large arrays:
import pandas as pd
df = pd.DataFrame([[0, 0, 0],[0, 10, 0],[4, 0, 0],[1, 2, 3]],
columns=['first', 'second', 'third'])
res = [next(((j, i) for i, j in enumerate(df[col]) if j != 0), (0, 0)) for col in df]
df_res = pd.DataFrame(res, columns=['value', 'position'], index=df.columns)
print(df_res)
value position
first 4 2
second 10 1
third 3 3
Solution 2:[2]
I will using stack , index is for row and column number
df[df.eq(df.max(1),0)&df.ne(0)].stack()
Out[252]:
1 second 10.0
2 first 4.0
3 third 3.0
dtype: float64
Solution 3:[3]
You can also use Numpy's nonzero function for this.
positions = [df[col].to_numpy().nonzero()[0][0] for col in df]
df_res = pd.DataFrame({'value': df.to_numpy()[(positions, range(3))],
'position': positions}, index=df.columns)
print(df_res)
value position
first 4 2
second 10 1
third 3 3
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | jpp |
| Solution 2 | BENY |
| Solution 3 | Bill |
