'Pandas(Python) : Fill empty cells with with previous row value?
I want to fill empty cells with with previous row value if they start with number. For example, I have
Text Text
30 Text Text
Text Text
Text Text
31 Text Text
Text Text
31 Text Text
Text Text
Text Text
32 Text Text
Text Text
Text Text
Text Text
Text Text
Text Text
I however, want to have
Text Text
30 Text Text
30 Text Text
30 Text Text
31 Text Text
Text Text
31 Text Text
31 Text Text
31 Text Text
32 Text Text
Text Text
Text Text
Text Text
Text Text
Text Text
I tried to reach this by using this code:
data = pd.read_csv('DATA.csv',sep='\t', dtype=object, error_bad_lines=False)
data = data.fillna(method='ffill', inplace=True)
print(data)
but it did not work.
Is there anyway to do this?
Solution 1:[1]
First, replace your empty cells with NaNs:
df[df[0]==""] = np.NaN
Now, Use ffill()
:
df.fillna(method='ffill')
# 0
#0 Text
#1 30
#2 30
#3 30
#4 31
#5 Text
#6 31
#7 31
#8 31
#9 32
Solution 2:[2]
I think you can first get NaN
instead whitespaces
:
df.Text = df.Text[df.Text.str.strip() != '']
print (df)
Text Text.1
0 30 Text Text
1 NaN Text Text
2 NaN Text Text
3 31 Text Text
4 Text Text NaN
5 31 Text Text
6 NaN Text Text
7 NaN Text Text
8 32 Text Text
9 Text Text NaN
10 NaN Text Text
11 NaN Text Text
12 NaN Text Text
13 NaN Text Text
Then use ffill
(same as fillna
with parameter ffill
), get to_numeric
for where
for replace NaN
if not numeric forward filling NaN
, last replace NaN
by empty string by fillna
:
orig = df.Text.copy()
df.Text = df.Text.ffill()
mask1 = pd.to_numeric(df.Text, errors='coerce')
df.Text = df.Text.where(mask1, orig).fillna('')
print (df)
Text Text.1
0 30 Text Text
1 30 Text Text
2 30 Text Text
3 31 Text Text
4 Text Text NaN
5 31 Text Text
6 31 Text Text
7 31 Text Text
8 32 Text Text
9 Text Text NaN
10 Text Text
11 Text Text
12 Text Text
13 Text Text
Solution 3:[3]
In general, if you want to fill empty cells with the previous row value, you can just use a recursive function like:
def same_as_upper(col:pd.Series)-> pd.Series:
'''
Recursively fill NaN rows with the previous value
'''
if any(pd.Series(col).isna()):
col=pd.Series(np.where(col.isna(), col.shift(1), col))
return same_as_upper(col)
else:
return col
Applying the function using df['A']=same_as_upper(df['A'])
, this simple case with your data would render:
A | B | C |
---|---|---|
Text | Text | nan |
30 | Text | Text |
30 | Text | Text |
30 | Text | Text |
31 | Text | Text |
Text | Text | nan |
31 | Text | Text |
31 | Text | Text |
31 | Text | Text |
32 | Text | Text |
Text | Text | nan |
Text | Text | Text |
Text | Text | Text |
Text | Text | Text |
Text | Text | Text |
However, you have the additional constraint of not replicating the previous value if this value is not an integer. In that case, one solution is add a sample value in the appropriate location, use the same function and then replace the sample value for nan
:
df.loc[11,'A']=999
df['A']=same_as_upper(df['A'])
df['A']=df['A'].replace(999,np.nan)
Result:
A | B | C |
---|---|---|
Text | Text | nan |
30 | Text | Text |
30 | Text | Text |
30 | Text | Text |
31 | Text | Text |
Text | Text | nan |
31 | Text | Text |
31 | Text | Text |
31 | Text | Text |
32 | Text | Text |
Text | Text | nan |
nan | Text | Text |
nan | Text | Text |
nan | Text | Text |
nan | Text | Text |
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | DYZ |
Solution 2 | |
Solution 3 | Lucas |