'Get value from right-most column consisting of a value and store in new column?
Below is a current sample df.
+-------+---+---+----+---+---+---+
| name | 1 | 2 | 3 | 4 | 5 | 6 |
+-------+---+---+----+---+---+---+
| mark | a | b | c | d | | |
| joe | a | b | | | | |
| bob | a | b | c | d | e | f |
| luke | a | | | | | |
| hana | a | b | d | | | |
| kelly | a | b | c | | | |
+-------+---+---+----+---+---+---+
My desired output is below, with a "final" column that pulls the value from the right-most column that contains a value.
+-------+---+---+----+---+---+---+-------+
| name | 1 | 2 | 3 | 4 | 5 | 6 | final |
+-------+---+---+----+---+---+---+-------+
| mark | a | b | c | d | | | d |
| joe | a | b | | | | | b |
| bob | a | b | c | d | e | f | f |
| luke | a | | | | | | a |
| hana | a | b | c | | | | c |
| kelly | a | b | c | | | | c |
+-------+---+---+----+---+---+---+-------+
I've tried to use the following if statement (along w/ a thousand other things), but it doesn't seem to work...
if df['2'].isna == True:
df['final']=df['1']
elif df['2'].isna == False and df['3'].isna == True:
df['final']=df['2']
I figure I need to use a function here? But after several hours of perusing stack overflow and experimenting, I'm stuck.
Solution 1:[1]
This is a simplified version using ffill:
df['final'] = df.ffill(axis=1).iloc[:, -1]
axis=1 directs Pandas to sweep across columns. And iloc[:, -1] makes sure that the 'final' value is determined beginning with the rightmost column that does not contain 'NaN'.
Output:
name 1 2 3 4 5 6 final
0 mark a b c d NaN NaN d
1 joe a b NaN NaN NaN NaN b
2 bob a b c d e f f
3 luke a NaN NaN NaN NaN NaN a
4 hana a b c NaN NaN NaN c
5 kelly a b c NaN NaN NaN c
Solution 2:[2]
You can write a short function and use apply on each row:
df = pd.DataFrame({'1': ['a', 'b', 'c'],
'2': ['a', 'd', np.nan],
'3': ['f', np.nan, np.nan]
}, index=['mark', 'james', 'bob'])
def first_nan(row):
loc = len(row[row.isna()])
return row[-(loc + 1)]
df['final'] = df.apply(first_nan, axis=1)
Output:
1 2 3 final
mark a a f f
james b d NaN d
bob c NaN NaN c
Solution 3:[3]
Try:
df["final"]=df[df.columns[::-1]].bfill(axis=1).iloc[:,0]
(*) In case if the empty one is empty string, and not nan - you should start with:
df=df.replace("", np.nan)
Using @nrfd input the output:
df = pd.DataFrame({'1': ['a', 'b', 'c'],
'2': ['a', 'd', np.nan],
'3': ['f', np.nan, np.nan]
}, index=['mark', 'james', 'bob'])
df["final"]=df[df.columns[::-1]].bfill(axis=1).iloc[:,0]
>>> df
1 2 3 final
mark a a f f
james b d NaN d
bob c NaN NaN c
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Thomas Gamsjäger |
| Solution 2 | |
| Solution 3 | Grzegorz Skibinski |
