'interpolation of missing values not NA

I want to interpolate (linear interpolation) data. but not indicated NA.

Here is my data.

timestamp id strength
1383260400000 10 0.007802251505435157
1383260400000 11 0.0050101566841440105
1383260400000 47 0.09910993935546883
1383260400000 48 0.16302926693340128

and expected data is :

timestamp id strength
1383260400000 10 0.007802251505435157
1383260400000 11 0.0050101566841440105
1383260400000 12 interpolated strength
1383260400000 13 interpolated strength
1383260400000 14 interpolated strength
1383260400000 15 interpolated strength
1383260400000 16 interpolated strength
1383260400000 17 interpolated strength
1383260400000 ... interpolated strength
1383260400000 47 0.09910993935546883
1383260400000 48 0.16302926693340128

and there are data with another timestamp.

timestamp id strength
1383261000000 73 0.00034018953748529387
1383261000000 80 0.015745603609017354
1383261000000 81 0.01332206498346922
1383261000000 101 0.003292329017912283
1383261000000 102 0.0028735259562922954
1383261000000 103 0.0033962211735905955

and the last id is 2025 and timestamp is 13833462000000



Solution 1:[1]

First convert values of id to index, so possible use lambda function in GroupBy.apply with Series.reindex and Series.interpolate:

f = lambda x: x.reindex(range(x.index.min(), x.index.max() + 1)).interpolate()
df = df.set_index('id').groupby('timestamp')['strength'].apply(f).reset_index()

print (df)
        timestamp  id  strength
0   1383260400000  10  0.007802
1   1383260400000  11  0.005010
2   1383260400000  12  0.007624
3   1383260400000  13  0.010238
4   1383260400000  14  0.012852
5   1383260400000  15  0.015466
6   1383260400000  16  0.018080
7   1383260400000  17  0.020693
8   1383260400000  18  0.023307
9   1383260400000  19  0.025921
10  1383260400000  20  0.028535
11  1383260400000  21  0.031149
12  1383260400000  22  0.033763
13  1383260400000  23  0.036377
14  1383260400000  24  0.038991
15  1383260400000  25  0.041605
16  1383260400000  26  0.044218
17  1383260400000  27  0.046832
18  1383260400000  28  0.049446
19  1383260400000  29  0.052060
20  1383260400000  30  0.054674
21  1383260400000  31  0.057288
22  1383260400000  32  0.059902
23  1383260400000  33  0.062516
24  1383260400000  34  0.065129
25  1383260400000  35  0.067743
26  1383260400000  36  0.070357
27  1383260400000  37  0.072971
28  1383260400000  38  0.075585
29  1383260400000  39  0.078199
30  1383260400000  40  0.080813
31  1383260400000  41  0.083427
32  1383260400000  42  0.086041
33  1383260400000  43  0.088654
34  1383260400000  44  0.091268
35  1383260400000  45  0.093882
36  1383260400000  46  0.096496
37  1383260400000  47  0.099110
38  1383260400000  48  0.163029

Solution 2:[2]

Use:

#preparing data
str1 = """timestamp id  strength
13  10  0.007802251505435157
13  11  0.0050101566841440105
13  47  0.09910993935546883
14  48  5
14  60  11"""
data = [x.split() for x in str1.split('\n')]
df = pd.DataFrame(data[1:], columns = data[0])


df['id'] = df['id'].astype(int)
df['strength'] = df['strength'].astype(float)
df['timestamp'] = df['timestamp'].astype(float)


#solution

res = df.groupby('timestamp').agg({'id': lambda x: range(x.min(), x.max()+1)}).explode('id').reset_index().merge(df, on=['timestamp', 'id'], how='outer')
res['strength'] = res.groupby(['strength'])['strength'].transform(lambda x: x.interpolate())
res

Output:

timestamp   id  strength
0   13.0    10.0    0.007802
1   13.0    11.0    0.005010
2   13.0    12.0    0.007624
3   13.0    13.0    0.010238
4   13.0    14.0    0.012852
5   13.0    15.0    0.015466
6   13.0    16.0    0.018080
7   13.0    17.0    0.020693
8   13.0    18.0    0.023307
9   13.0    19.0    0.025921
10  13.0    20.0    0.028535
11  13.0    21.0    0.031149
12  13.0    22.0    0.033763
13  13.0    23.0    0.036377
14  13.0    24.0    0.038991
15  13.0    25.0    0.041605
16  13.0    26.0    0.044218
17  13.0    27.0    0.046832
18  13.0    28.0    0.049446
19  13.0    29.0    0.052060
20  13.0    30.0    0.054674
21  13.0    31.0    0.057288
22  13.0    32.0    0.059902
23  13.0    33.0    0.062516
24  13.0    34.0    0.065129
25  13.0    35.0    0.067743
26  13.0    36.0    0.070357
27  13.0    37.0    0.072971
28  13.0    38.0    0.075585
29  13.0    39.0    0.078199
30  13.0    40.0    0.080813
31  13.0    41.0    0.083427
32  13.0    42.0    0.086041
33  13.0    43.0    0.088654
34  13.0    44.0    0.091268
35  13.0    45.0    0.093882
36  13.0    46.0    0.096496
37  14.0    48.0    5.000000
38  14.0    49.0    5.500000
39  14.0    50.0    6.000000
40  14.0    51.0    6.500000
41  14.0    52.0    7.000000
42  14.0    53.0    7.500000
43  14.0    54.0    8.000000
44  14.0    55.0    8.500000
45  14.0    56.0    9.000000
46  14.0    57.0    9.500000
47  14.0    58.0    10.000000
48  14.0    59.0    10.500000
49  13.0    47.0    0.099110
50  14.0    60.0    11.000000

Solution 3:[3]

You can try groupby timestamp then reindex to make the id column continuous and fill the NA

out = (df.groupby('timestamp')
       .apply(lambda g: (g.set_index('id')
                         .reindex(range(g['id'].min(), g['id'].max()+1))
                         .reset_index()
                         .pipe(lambda df: df.fillna({'strength': 'interpolated strength', 'timestamp':g.name}))))
       .reset_index(drop=True))
out['timestamp'] = out['timestamp'].astype(int)
print(out)

    id      timestamp               strength
0   10  1383260400000               0.007802
1   11  1383260400000                0.00501
2   12  1383260400000  interpolated strength
3   13  1383260400000  interpolated strength
4   14  1383260400000  interpolated strength
5   15  1383260400000  interpolated strength
6   16  1383260400000  interpolated strength
7   17  1383260400000  interpolated strength
8   18  1383260400000  interpolated strength
9   19  1383260400000  interpolated strength
10  20  1383260400000  interpolated strength
11  21  1383260400000  interpolated strength
12  22  1383260400000  interpolated strength
13  23  1383260400000  interpolated strength
14  24  1383260400000  interpolated strength
15  25  1383260400000  interpolated strength
16  26  1383260400000  interpolated strength
17  27  1383260400000  interpolated strength
18  28  1383260400000  interpolated strength
19  29  1383260400000  interpolated strength
20  30  1383260400000  interpolated strength
21  31  1383260400000  interpolated strength
22  32  1383260400000  interpolated strength
23  33  1383260400000  interpolated strength
24  34  1383260400000  interpolated strength
25  35  1383260400000  interpolated strength
26  36  1383260400000  interpolated strength
27  37  1383260400000  interpolated strength
28  38  1383260400000  interpolated strength
29  39  1383260400000  interpolated strength
30  40  1383260400000  interpolated strength
31  41  1383260400000  interpolated strength
32  42  1383260400000  interpolated strength
33  43  1383260400000  interpolated strength
34  44  1383260400000  interpolated strength
35  45  1383260400000  interpolated strength
36  46  1383260400000  interpolated strength
37  47  1383260400000                0.09911
38  48  1383260400000               0.163029

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3 Ynjxsjmh