'Concatenate Dataframes on common values yields NaN values for non-matches [python]
I am trying to merge/concatenate two dataframes on a common column and match up all the corresponding values. However, while matching values receive corresponding values for that row, when there is no match, a NaN value is produced. I am using python for this. I will explain here in more detail.
I have this dataframe A:
ID Area Distance Height Temp
----------------------------------------------------
0 100 8.31 0 1.30 24.27
1 101 3.11 0 1.29 25.99
2 102 5.10 0 1.23 29.51
3 105 9.70 0 1.97 15.17
4 107 4.77 0 1.53 27.84
...
Each ID represents a different building footprint (polygon), with its Area Recorded, the height of the building, and the mean outdoor temperature recorded at the site of the building. The "Distance" column denotes the distance away from the building at which the temperature was recorded, and so onsite = 0 meters away.
And I have this dataframe B:
ID Temp Distance
---------------------------------
0 100 25.68 5
1 100 26.05 10
2 100 26.85 15
3 100 27.25 20
4 100 27.78 25
5 101 22.68 5
6 101 26.44 10
7 101 26.83 15
8 101 27.26 20
9 101 28.38 25
10 102 25.63 5
11 102 26.26 10
12 102 26.57 15
13 102 26.91 20
14 102 28.84 25
15 105 25.33 5
16 105 26.25 10
17 105 26.54 15
18 105 26.23 20
19 105 27.53 25
20 107 25.23 5
21 107 26.73 10
22 107 26.26 15
23 107 26.11 20
24 107 27.16 25
...
This shows for the same building IDs the temperatures recorded at different distances away from the building, and so for each building I want the recorded mean outdoor temperature 5 meters away, 10 meters away, 15 meters away, 20 meters away, and 25 meters away.
What I want to do is join dataframes A and B by the common "ID" column. And so what I want to do is produce a dataframe C that shows for each ID that buildings Temperature at Distances 0, 5, 10, 15, 20, and 25. The issue is that I want for each building ID the Area and Height to remain the same, because of course the buildings area and height will not change! And so I want to produce the following dataframe C:
ID Area Distance Height Temp
----------------------------------------------------
0 100 8.31 0 1.30 24.27
1 100 8.31 5 1.30 25.68
2 100 8.31 10 1.30 26.05
3 100 8.31 15 1.30 26.85
4 100 8.31 20 1.30 27.25
5 100 8.31 25 1.30 27.78
6 101 3.11 0 1.29 25.99
7 101 3.11 5 1.29 22.68
8 101 3.11 10 1.29 26.44
9 101 3.11 15 1.29 26.83
10 101 3.11 20 1.29 27.26
11 101 3.11 25 1.29 28.38
12 102 5.10 0 1.23 29.51
13 102 5.10 5 1.23 25.63
14 102 5.10 10 1.23 26.26
15 102 5.10 15 1.23 26.57
16 102 5.10 20 1.23 26.91
17 102 5.10 25 1.23 28.84
18 105 9.70 0 1.97 15.17
19 105 9.70 5 1.97 25.33
20 105 9.70 10 1.97 26.25
21 105 9.70 15 1.97 26.54
22 105 9.70 20 1.97 26.23
23 105 9.70 25 1.97 27.53
24 107 4.77 0 1.53 27.84
25 107 4.77 5 1.53 25.23
26 107 4.77 10 1.53 26.73
27 107 4.77 15 1.53 26.26
28 107 4.77 20 1.53 26.11
29 107 4.77 25 1.53 27.16
...
And so to obtain this I try the following, trying to concatenate dataframes A and B on the "ID" column, and then sorting the rows by "ID" and "Distance":
df_C = pd.concat([df_A, df_B]).sort_values(["ID", "Distance"]).reset_index(drop=True)
However this yields:
ID Area Distance Height Temp
----------------------------------------------------
0 100 8.31 0 1.30 24.27
1 100 NaN 5 NaN 25.68
2 100 NaN 10 NaN 26.05
3 100 NaN 15 NaN 26.85
4 100 NaN 20 NaN 27.25
5 100 NaN 25 NaN 27.78
6 101 3.11 0 1.29 25.99
7 101 NaN 5 NaN 22.68
8 101 NaN 10 NaN 26.44
9 101 NaN 15 NaN 26.83
10 101 NaN 20 NaN 27.26
11 101 NaN 25 NaN 28.38
12 102 5.10 0 1.23 29.51
13 102 NaN 5 NaN 25.63
14 102 NaN 10 NaN 26.26
15 102 NaN 15 NaN 26.57
16 102 NaN 20 NaN 26.91
17 102 NaN 25 NaN 28.84
18 105 9.70 0 1.97 15.17
19 105 NaN 5 NaN 25.33
20 105 NaN 10 NaN 26.25
21 105 NaN 15 NaN 26.54
22 105 NaN 20 NaN 26.23
23 105 NaN 25 NaN 27.53
24 107 4.77 0 1.53 27.84
25 107 NaN 5 NaN 25.23
26 107 NaN 10 NaN 26.73
27 107 NaN 15 NaN 26.26
28 107 NaN 20 NaN 26.11
29 107 NaN 25 NaN 27.16
...
And so it appears that the Area and Height values are not getting matched up because Dataframe B does not contain the corresponding Area and Height Values, and so there is nothing to report there when I merge the two dataframes. How can I fix this issue so that I get my intended Dataframe C?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
