'How can I resolve an empty Pandas Dataframe when using Inner merge?
I'm trying to merge two DataFrames together based off 'Geographic Area'. It's successfully adding the column but says it's an empty Index. The Left join keeps the first DataFrame intact, the Right join keeps the second DataFrame intact. Outer does neither. It puts NaN in the merged column. This leads me to believe that it's not recognizing "Geographic Area" as being the same data. There are the exact same entries in both columns.
Here's the "error" I'm getting. So it's successfully adding the Unemployment Rate as column but none of the data is populated.
Empty DataFrame
Columns: [Geographic Area, 4/1/2010 Census, 4/1/2020 Census, % Change, Unemployment Rate]
Index: []
Here's a print of the first two entries of df3:
Geographic Area Unemployment Rate
0 Acadia 3.6
1 Allen 4.0
Here's a print of the first two entries of df1:
Geographic Area 4/1/2010 Census 4/1/2020 Census % Change
0 Acadia 61787 57576 -6.815350
1 Allen 25747 22750 -11.640191
Here's the whole code
import pandas as pd
# Used for printing in Pycharm
pd.set_option('display.width', 400)
pd.set_option('display.max_columns', 10)
df1 = pd.read_excel('C:/Users/Notebook/Downloads/louisiana-2020-census-parish.xlsx', skiprows=1)
df2 = pd.read_excel('Parish Unemployment.xlsx', skiprows=2)
data = []
# The original df2 has Parish and , LA at the end of each, so this removes it from each row
for index, row in df2.iterrows():
area = row['Area']
area = area[:-10]
row[0] = area
data.append(row)
df3 = pd.DataFrame(data, columns=['Area', 'Rate'])
df3.rename(columns={"Area": "Geographic Area", "Rate": "Unemployment Rate"}, inplace=True)
# Used to create a % for Plotly/Dash Choropleth Maps
df3['Unemployment Rate'] = df3['Unemployment Rate'].astype(float)
df3['Unemployment Rate'] = df3['Unemployment Rate'].multiply(100)
df1.rename(columns={"Percent Change in Population 2010-2020": "% Change"}, inplace=True)
# Used to create a % for Plotly/Dash Choropleth Maps
df1['% Change'] = df1['% Change'].astype(float)
df1['% Change'] = df1['% Change'].multiply(100)
df = pd.merge(df1, df3, on='Geographic Area', how='inner')
print(df)
Solution 1:[1]
I figured it out. In both dataframes, the 'Geographic' area were string objects, but it turns out df3 had trailing spaces. I simply added .rstrip() to fix it and the dataframe merged successfully. Working code below.
import pandas as pd
# Used for printing in Pycharm
pd.set_option('display.width', 400)
pd.set_option('display.max_columns', 10)
df1 = pd.read_excel('C:/Users/Notebook/Downloads/louisiana-2020-census-parish.xlsx', skiprows=1)
df2 = pd.read_excel('Parish Unemployment.xlsx', skiprows=2)
data = []
# The original df2 has Parish and , LA at the end of each, so this removes it from each row
for index, row in df2.iterrows():
area = row['Area']
area = area[:-10]
row[0] = area.rstrip()
data.append(row)
df3 = pd.DataFrame(data, columns=['Area', 'Rate'])
df3.rename(columns={"Area": "Geographic Area", "Rate": "Unemployment Rate"}, inplace=True)
# Used to create a % for Plotly/Dash Choropleth Maps
df3['Unemployment Rate'] = df3['Unemployment Rate'].astype(float)
df3['Unemployment Rate'] = df3['Unemployment Rate'].multiply(100)
df1.rename(columns={"Percent Change in Population 2010-2020": "% Change"}, inplace=True)
# Used to create a % for Plotly/Dash Choropleth Maps
df1['% Change'] = df1['% Change'].astype(float)
df1['% Change'] = df1['% Change'].multiply(100)
df = pd.merge(df1, df3, on='Geographic Area', how='inner')
print(df)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Brandon Jacobson |
