'Fill in between surrogate rows with Nan in Pandas

Given a df

elapse data datx
0,4,0
2,0,4
4,3,2
6,3,1
14,3,0
16,1,1
18,3,1
20,2,0
22,4,1
24,0,4

There are missing number in the columns elapse. Specifically the value 8,10,12.

I would like to create and append the value 8,10,12 onto the column elapse, and assign np.nan at the other columns (i.e., data, and datx)

Which will result as below

0.00000,4.00000,0.00000
2.00000,0.00000,4.00000
4.00000,3.00000,2.00000
6.00000,3.00000,1.00000
8.00000,nan,nan
10.00000,nan,nan
12.00000,nan,nan
14.00000,3.00000,0.00000
16.00000,1.00000,1.00000
18.00000,3.00000,1.00000
20.00000,2.00000,0.00000
22.00000,4.00000,1.00000
24.00000,0.00000,4.00000

The following code should do the job

import pandas as pd
import numpy as np

np.random.seed(0)

arr=np.concatenate([np.arange(0,8,2),np.arange(14,26,2)])
df=pd.DataFrame({'elapse': arr, 'data': np.random.randint(5,size=(len(arr))),
'datx': np.random.randint(5,size=(len(arr)))}, columns=['elapse', 'data','datx'])

g=df['elapse'].diff()
hh=g.idxmax()
interval_val=g.min()
missval_start=df.loc[hh-1,'elapse']+g.min()
missval_end=df.loc[hh+1,'elapse']-g.min()
new_val=np.arange(missval_start,missval_end,interval_val)
df_new=pd.DataFrame(new_val,columns=['elapse'])
df_new[['data','datx']]=np.nan

df=pd.concat([df,df_new])
df=df.sort_values('elapse')

But, I curious if other may have better suggestion than mine. This is because, in real case, the number of missing value is huge. Hence, I am more than happy for for more compact and efficient suggestion



Solution 1:[1]

You can use reindex method:

df = df.set_index('elapse').reindex(range(df['elapse'].min(), df['elapse'].max()+1, 2)).reset_index()

Output:

    elapse  data  datx
0        0   4.0   0.0
1        2   0.0   4.0
2        4   3.0   2.0
3        6   3.0   1.0
4        8   NaN   NaN
5       10   NaN   NaN
6       12   NaN   NaN
7       14   3.0   0.0
8       16   1.0   1.0
9       18   3.0   1.0
10      20   2.0   0.0
11      22   4.0   1.0
12      24   0.0   4.0

Solution 2:[2]

You can simply create a new DataFarme with the new index:

df = pd.DataFrame.from_dict({
    'data': {0: 4, 2: 0, 4: 3, 6: 3, 14: 3, 16: 1, 18: 3, 20: 2, 22: 4, 24: 0},
    'datx': {0: 0, 2: 4, 4: 2, 6: 1, 14: 0, 16: 1, 18: 1, 20: 0, 22: 1, 24: 4}
})

result = pd.DataFrame(index=np.arange(0, 25, 2), data=df)
print(result)

#     data  datx
# 0    4.0   0.0
# 2    0.0   4.0
# 4    3.0   2.0
# 6    3.0   1.0
# 8    NaN   NaN
# 10   NaN   NaN
# 12   NaN   NaN
# 14   3.0   0.0
# 16   1.0   1.0
# 18   3.0   1.0
# 20   2.0   0.0
# 22   4.0   1.0
# 24   0.0   4.0

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 hilberts_drinking_problem