'Python: how to match columns between 2 different dataframes

I've been searching around for a while now, but I can't seem to find the answer to this small problem.

I created this code to match columns between dataframes with the following additional conditions. how to shorten this process, because the iterative process can make the calculation a bit long

data_bio_anak = {'nama':['James','Adinda','Joni','Zain', 'Linda'],
            'age':[11, 12, 13, 16, 18],
            'address':['kotabumi', 'tanjung duren', 'cipulir', 'kokas', 'ciputat'],
            'food':['pizza','burger','bakso','mie ayam','seblak'],
            'edukasi':['s1','s2','d3','sma','s3'],
           }

df_bio_anak = pd.DataFrame(data_bio_anak)
df_bio_anak

data_bio_dewasa = {'nama':['Sandy','Toni','Jami','Juda', 'Wong'],
            'age':[21, 32, 43, 26, 28],
            'address':['kotabumi', 'tanjung duren', 'cipulir', 'kokas', 'ciputat'],
            'food':['pizza','burger','bakso','mie ayam','seblak'],
            'edukasi':['s1','s2','d3','sma','s3'],
            'status':['pacaran','single','menikah','pelajar','mahasiswa'],
            'provinsi':['banten','jakarta','medan','sumatra','kalimantan']
           }

df_bio_dewasa = pd.DataFrame(data_bio_dewasa)
df_bio_dewasa

in this case, I just want to match every column between dataframes. as well as some additional commands as follows

df_bio_anak = pd.get_dummies(df_bio_anak)

for c in df_bio_anak.columns:
    if c not in df_bio_dewasa.columns:
        df_bio_anak.drop(c, axis=1, inplace=True)

for c in df_bio_dewasa.columns:
    if c not in df_bio_anak.columns:
        df_bio_anak[c] = 0

df_bio_anak = df_bio_anak[df_bio_dewasa.columns]

is there any other solution to fix this with simpler program code without looping? because it will take a long time to compute



Solution 1:[1]

You could try this:

cols_of_bio_anak_to_drop = [
    c for c in df_bio_anak.columns if c not in df_bio_dewasa.columns
]
df_bio_anak = df_bio_anak.drop(cols_of_bio_anak_to_drop, axis=1, inplace=False)

cols_to_add_to_bio_anak = [
    c for c in df_bio_dewasa.columns if c not in df_bio_anak.columns
]

df_bio_anak = df_bio_anak.reindex(
    columns=list(df_bio_anak.columns) + cols_to_add_to_bio_anak, fill_value=0
)

print(df_bio_anak)
# Ouputs
     nama  age        address      food edukasi  status  provinsi
0   James   11       kotabumi     pizza      s1       0         0
1  Adinda   12  tanjung duren    burger      s2       0         0
2    Joni   13        cipulir     bakso      d3       0         0
3    Zain   16          kokas  mie ayam     sma       0         0
4   Linda   18        ciputat    seblak      s3       0         0

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Laurent