'part of data become nan after merging
I am trying merge these 2 tables and here is the code First Table code and result
import pandas as pd
dfpopulation = pd.read_csv("BoroughPopulation.csv")
dfpopulation = dfpopulation.filter(["Borough","2020"], axis = 1)
dfpopulation["2020"]=dfpopulation['2020'].astype(int)
dfpopulation.head
<bound method NDFrame.head of Borough 2020
0 NYC Total 8550971
1 Bronx 1446788
2 Brooklyn 2648452
3 Manhattan 1638281
4 Queens 2330295
5 Staten Island 487155>
Second Table code and result
Homeless = [["NYC Total",totalNYC],["Bronx",totalBronx],["Brooklyn",totalBrooklyn],["Manhattan",totalManhattan],["Queens",totalQueens],["Staten Island",totalStatenIsland]]
dfhomeless2 = pd.DataFrame(Homeless, columns= ['Borough','Total'])
dfhomeless2["Total"]=dfhomeless2["Total"].astype(int)
dfhomeless2.head
<bound method NDFrame.head of Borough Total
0 NYC Total 97104
1 Bronx 16875
2 Brooklyn 33832
3 Manhattan 36821
4 Queens 9576
5 Staten Island 0>
Code I used and result
dfCompare = pd.merge(dfpopulation,dfhomeless2[["Borough","Total"]],on="Borough",how="left")
dfCompare.head
<bound method NDFrame.head of Borough 2020 Total
0 NYC Total 8550971 97104.0
1 Bronx 1446788 NaN
2 Brooklyn 2648452 NaN
3 Manhattan 1638281 NaN
4 Queens 2330295 NaN
5 Staten Island 487155 NaN>
Info of the result
<class 'pandas.core.frame.DataFrame'>
Int64Index: 6 entries, 0 to 5
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Borough 6 non-null object
1 2020 6 non-null int32
2 Total 1 non-null float64
dtypes: float64(1), int32(1), object(1)
memory usage: 168.0+ bytes
Anyone know why part of data become nan and how to solve it?
Solution 1:[1]
I entered your values/code and it works fine. I assume there is a discrepancy in the names of the boroughs in your dataframes. Make sure they match. Because you are using a left join, the new column will include all values from the left dataframe and only the right dataframe ones IF there is a matching borough - otherwise, it returns a NaN, which is what you are seeing. See that the following works on your computer. You could also try how='outer' which will still give you a bunch of NaNs but may help you figure out where your issues are.
A = pd.DataFrame({
'borough': ['NYC Total', 'Bronx', 'Brooklyn', 'Manhattan', 'Queens','Staten Island'],
'total' : [8550971,1446788,2648452,1638281,330295, 487155]
})
B = pd.DataFrame({
'borough': ['NYC Total', 'Bronx', 'Brooklyn', 'Manhattan', 'Queens','Staten Island'],
'2020' : [97104,16875,33832,36821, 9576,0]
})
pd.merge(left=A, right=B, on='borough', how='left')
borough total 2020
0 NYC Total 8550971 97104
1 Bronx 1446788 16875
2 Brooklyn 2648452 33832
3 Manhattan 1638281 36821
4 Queens 330295 9576
5 Staten Island 487155 0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Coup |
