'interpolation of missing values not NA
I want to interpolate (linear interpolation) data. but not indicated NA.
Here is my data.
| timestamp | id | strength |
|---|---|---|
| 1383260400000 | 10 | 0.007802251505435157 |
| 1383260400000 | 11 | 0.0050101566841440105 |
| 1383260400000 | 47 | 0.09910993935546883 |
| 1383260400000 | 48 | 0.16302926693340128 |
and expected data is :
| timestamp | id | strength |
|---|---|---|
| 1383260400000 | 10 | 0.007802251505435157 |
| 1383260400000 | 11 | 0.0050101566841440105 |
| 1383260400000 | 12 | interpolated strength |
| 1383260400000 | 13 | interpolated strength |
| 1383260400000 | 14 | interpolated strength |
| 1383260400000 | 15 | interpolated strength |
| 1383260400000 | 16 | interpolated strength |
| 1383260400000 | 17 | interpolated strength |
| 1383260400000 | ... | interpolated strength |
| 1383260400000 | 47 | 0.09910993935546883 |
| 1383260400000 | 48 | 0.16302926693340128 |
and there are data with another timestamp.
| timestamp | id | strength |
|---|---|---|
| 1383261000000 | 73 | 0.00034018953748529387 |
| 1383261000000 | 80 | 0.015745603609017354 |
| 1383261000000 | 81 | 0.01332206498346922 |
| 1383261000000 | 101 | 0.003292329017912283 |
| 1383261000000 | 102 | 0.0028735259562922954 |
| 1383261000000 | 103 | 0.0033962211735905955 |
and the last id is 2025 and timestamp is 13833462000000
Solution 1:[1]
First convert values of id to index, so possible use lambda function in GroupBy.apply with Series.reindex and Series.interpolate:
f = lambda x: x.reindex(range(x.index.min(), x.index.max() + 1)).interpolate()
df = df.set_index('id').groupby('timestamp')['strength'].apply(f).reset_index()
print (df)
timestamp id strength
0 1383260400000 10 0.007802
1 1383260400000 11 0.005010
2 1383260400000 12 0.007624
3 1383260400000 13 0.010238
4 1383260400000 14 0.012852
5 1383260400000 15 0.015466
6 1383260400000 16 0.018080
7 1383260400000 17 0.020693
8 1383260400000 18 0.023307
9 1383260400000 19 0.025921
10 1383260400000 20 0.028535
11 1383260400000 21 0.031149
12 1383260400000 22 0.033763
13 1383260400000 23 0.036377
14 1383260400000 24 0.038991
15 1383260400000 25 0.041605
16 1383260400000 26 0.044218
17 1383260400000 27 0.046832
18 1383260400000 28 0.049446
19 1383260400000 29 0.052060
20 1383260400000 30 0.054674
21 1383260400000 31 0.057288
22 1383260400000 32 0.059902
23 1383260400000 33 0.062516
24 1383260400000 34 0.065129
25 1383260400000 35 0.067743
26 1383260400000 36 0.070357
27 1383260400000 37 0.072971
28 1383260400000 38 0.075585
29 1383260400000 39 0.078199
30 1383260400000 40 0.080813
31 1383260400000 41 0.083427
32 1383260400000 42 0.086041
33 1383260400000 43 0.088654
34 1383260400000 44 0.091268
35 1383260400000 45 0.093882
36 1383260400000 46 0.096496
37 1383260400000 47 0.099110
38 1383260400000 48 0.163029
Solution 2:[2]
Use:
#preparing data
str1 = """timestamp id strength
13 10 0.007802251505435157
13 11 0.0050101566841440105
13 47 0.09910993935546883
14 48 5
14 60 11"""
data = [x.split() for x in str1.split('\n')]
df = pd.DataFrame(data[1:], columns = data[0])
df['id'] = df['id'].astype(int)
df['strength'] = df['strength'].astype(float)
df['timestamp'] = df['timestamp'].astype(float)
#solution
res = df.groupby('timestamp').agg({'id': lambda x: range(x.min(), x.max()+1)}).explode('id').reset_index().merge(df, on=['timestamp', 'id'], how='outer')
res['strength'] = res.groupby(['strength'])['strength'].transform(lambda x: x.interpolate())
res
Output:
timestamp id strength
0 13.0 10.0 0.007802
1 13.0 11.0 0.005010
2 13.0 12.0 0.007624
3 13.0 13.0 0.010238
4 13.0 14.0 0.012852
5 13.0 15.0 0.015466
6 13.0 16.0 0.018080
7 13.0 17.0 0.020693
8 13.0 18.0 0.023307
9 13.0 19.0 0.025921
10 13.0 20.0 0.028535
11 13.0 21.0 0.031149
12 13.0 22.0 0.033763
13 13.0 23.0 0.036377
14 13.0 24.0 0.038991
15 13.0 25.0 0.041605
16 13.0 26.0 0.044218
17 13.0 27.0 0.046832
18 13.0 28.0 0.049446
19 13.0 29.0 0.052060
20 13.0 30.0 0.054674
21 13.0 31.0 0.057288
22 13.0 32.0 0.059902
23 13.0 33.0 0.062516
24 13.0 34.0 0.065129
25 13.0 35.0 0.067743
26 13.0 36.0 0.070357
27 13.0 37.0 0.072971
28 13.0 38.0 0.075585
29 13.0 39.0 0.078199
30 13.0 40.0 0.080813
31 13.0 41.0 0.083427
32 13.0 42.0 0.086041
33 13.0 43.0 0.088654
34 13.0 44.0 0.091268
35 13.0 45.0 0.093882
36 13.0 46.0 0.096496
37 14.0 48.0 5.000000
38 14.0 49.0 5.500000
39 14.0 50.0 6.000000
40 14.0 51.0 6.500000
41 14.0 52.0 7.000000
42 14.0 53.0 7.500000
43 14.0 54.0 8.000000
44 14.0 55.0 8.500000
45 14.0 56.0 9.000000
46 14.0 57.0 9.500000
47 14.0 58.0 10.000000
48 14.0 59.0 10.500000
49 13.0 47.0 0.099110
50 14.0 60.0 11.000000
Solution 3:[3]
You can try groupby timestamp then reindex to make the id column continuous and fill the NA
out = (df.groupby('timestamp')
.apply(lambda g: (g.set_index('id')
.reindex(range(g['id'].min(), g['id'].max()+1))
.reset_index()
.pipe(lambda df: df.fillna({'strength': 'interpolated strength', 'timestamp':g.name}))))
.reset_index(drop=True))
out['timestamp'] = out['timestamp'].astype(int)
print(out)
id timestamp strength
0 10 1383260400000 0.007802
1 11 1383260400000 0.00501
2 12 1383260400000 interpolated strength
3 13 1383260400000 interpolated strength
4 14 1383260400000 interpolated strength
5 15 1383260400000 interpolated strength
6 16 1383260400000 interpolated strength
7 17 1383260400000 interpolated strength
8 18 1383260400000 interpolated strength
9 19 1383260400000 interpolated strength
10 20 1383260400000 interpolated strength
11 21 1383260400000 interpolated strength
12 22 1383260400000 interpolated strength
13 23 1383260400000 interpolated strength
14 24 1383260400000 interpolated strength
15 25 1383260400000 interpolated strength
16 26 1383260400000 interpolated strength
17 27 1383260400000 interpolated strength
18 28 1383260400000 interpolated strength
19 29 1383260400000 interpolated strength
20 30 1383260400000 interpolated strength
21 31 1383260400000 interpolated strength
22 32 1383260400000 interpolated strength
23 33 1383260400000 interpolated strength
24 34 1383260400000 interpolated strength
25 35 1383260400000 interpolated strength
26 36 1383260400000 interpolated strength
27 37 1383260400000 interpolated strength
28 38 1383260400000 interpolated strength
29 39 1383260400000 interpolated strength
30 40 1383260400000 interpolated strength
31 41 1383260400000 interpolated strength
32 42 1383260400000 interpolated strength
33 43 1383260400000 interpolated strength
34 44 1383260400000 interpolated strength
35 45 1383260400000 interpolated strength
36 46 1383260400000 interpolated strength
37 47 1383260400000 0.09911
38 48 1383260400000 0.163029
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | |
| Solution 3 | Ynjxsjmh |
