'Python: how to match columns between 2 different dataframes
I've been searching around for a while now, but I can't seem to find the answer to this small problem.
I created this code to match columns between dataframes with the following additional conditions. how to shorten this process, because the iterative process can make the calculation a bit long
data_bio_anak = {'nama':['James','Adinda','Joni','Zain', 'Linda'],
'age':[11, 12, 13, 16, 18],
'address':['kotabumi', 'tanjung duren', 'cipulir', 'kokas', 'ciputat'],
'food':['pizza','burger','bakso','mie ayam','seblak'],
'edukasi':['s1','s2','d3','sma','s3'],
}
df_bio_anak = pd.DataFrame(data_bio_anak)
df_bio_anak
data_bio_dewasa = {'nama':['Sandy','Toni','Jami','Juda', 'Wong'],
'age':[21, 32, 43, 26, 28],
'address':['kotabumi', 'tanjung duren', 'cipulir', 'kokas', 'ciputat'],
'food':['pizza','burger','bakso','mie ayam','seblak'],
'edukasi':['s1','s2','d3','sma','s3'],
'status':['pacaran','single','menikah','pelajar','mahasiswa'],
'provinsi':['banten','jakarta','medan','sumatra','kalimantan']
}
df_bio_dewasa = pd.DataFrame(data_bio_dewasa)
df_bio_dewasa
in this case, I just want to match every column between dataframes. as well as some additional commands as follows
df_bio_anak = pd.get_dummies(df_bio_anak)
for c in df_bio_anak.columns:
if c not in df_bio_dewasa.columns:
df_bio_anak.drop(c, axis=1, inplace=True)
for c in df_bio_dewasa.columns:
if c not in df_bio_anak.columns:
df_bio_anak[c] = 0
df_bio_anak = df_bio_anak[df_bio_dewasa.columns]
is there any other solution to fix this with simpler program code without looping? because it will take a long time to compute
Solution 1:[1]
You could try this:
cols_of_bio_anak_to_drop = [
c for c in df_bio_anak.columns if c not in df_bio_dewasa.columns
]
df_bio_anak = df_bio_anak.drop(cols_of_bio_anak_to_drop, axis=1, inplace=False)
cols_to_add_to_bio_anak = [
c for c in df_bio_dewasa.columns if c not in df_bio_anak.columns
]
df_bio_anak = df_bio_anak.reindex(
columns=list(df_bio_anak.columns) + cols_to_add_to_bio_anak, fill_value=0
)
print(df_bio_anak)
# Ouputs
nama age address food edukasi status provinsi
0 James 11 kotabumi pizza s1 0 0
1 Adinda 12 tanjung duren burger s2 0 0
2 Joni 13 cipulir bakso d3 0 0
3 Zain 16 kokas mie ayam sma 0 0
4 Linda 18 ciputat seblak s3 0 0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Laurent |
