'How to combine dataframes but combine matching values and marking when there's a match?

New to datascience, sorry about the confusing wording. I need to combine two dataframes that have similar data. If column "C Name", exists in both, then I don't want it duplicated but a checkmark placed in both "SW1 contain" and "SW2 contain". If a C Name is unique, its just added to the combined table with its respective checkmark.

These are 2 simplified samples of my tables. I've to combine 4 frames together

df_s = pd.DataFrame({'C Category': ["Hull","Hull","Hull","Mid High","Mid High","Mid High2"],
                     'C Name': ["Trunk","Roof","Ceiling","Floor","Plank","Fuel"],
                     'SW1 Contain': ["NAN","NAN","NAN","NAN","NAN","NAN"],
                     'SW1 Type': ["NAN","NAN","NAN","NAN","NAN","NAN"],
                     'SW2 Contain': ["√","√","√","√","√","√"],
                     'SW2 Type': ["NAN","NAN","NAN","NAN","NAN","NAN"]})

  C Category   C Name SW1 Contain SW1 Type SW2 Contain SW2 Type
0       Hull    Trunk         NAN      NAN           √      NAN
1       Hull     Roof         NAN      NAN           √      NAN
2       Hull  Ceiling         NAN      NAN           √      NAN
3   Mid High    Floor         NAN      NAN           √      NAN
4   Mid High    Plank         NAN      NAN           √      NAN
5  Mid High2     Fuel         NAN      NAN           √      NAN

df_s2 = pd.DataFrame({'C Category': ["Hull","Hull","Lower","Lower2","Mid High","Mid High2"],
                     'C Name': ["Trunk","Roof","Tank","Axel","Floor","Fuel"],
                     'SW1 Contain': ["√","√","√","√","√","√"],
                     'SW1 Type': ["NAN","NAN","NAN","NAN","NAN","NAN"],
                     'SW2 Contain': ["NAN","NAN","NAN","NAN","NAN","NAN"],
                     'SW2 Type': ["NAN","NAN","NAN","NAN","NAN","NAN"]})

  C Category C Name SW1 Contain SW1 Type SW2 Contain SW2 Type
0       Hull  Trunk           √      NAN         NAN      NAN
1       Hull   Roof           √      NAN         NAN      NAN
2      Lower   Tank           √      NAN         NAN      NAN
3     Lower2   Axel           √      NAN         NAN      NAN
4   Mid High  Floor           √      NAN         NAN      NAN
5  Mid High2   Fuel           √      NAN         NAN      NAN

I need it to look something like this(Order of C Category/Name doesn't matter)

  C Category   C Name SW1 Contain SW1 Type SW2 Contain SW2 Type
0       Hull    Trunk           √      NAN           √      NAN
1       Hull     Roof           √      NAN           √      NAN
2       Hull  Ceiling         NAN      NAN           √      NAN
3      Lower     Tank           √      NAN         NAN      NAN
4     Lower2     Axel           √      NAN         NAN      NAN
5   Mid High    Floor           √      NAN           √      NAN
6   Mid High    Plank         NAN      NAN           √      NAN
7  Mid High2     Fuel           √      NAN           √      NAN

I've tried pd.concat([df_s,df[~df.isin(df_s)].dropna()]) (df is a blank dataframe that I'm adding the 4 tables to 1 by 1). The idea is to check if something doesn't exist in df, it it doesn't add it. But it doesn't work after 1 table is added. If that's not the best way to do this please let me know.

I'm also thinking of iterating thorough the frames and doing if statements. Something like if a row isn't in the df, append it to df. But that will be extremely costly so please let me know of the best wat to approach this. I'm still learning so would like any advice.



Solution 1:[1]

One option is to replace the "NaN" strings to NaN values, concatenate then use groupby.first to get rid of duplicate "C Category" and "C Name" pairs:

out = (pd.concat([df_s.replace('NAN', float('nan')), df_s2.replace('NAN', float('nan'))])
       .groupby(['C Category', 'C Name'], as_index=False, sort=False).first().fillna(float('nan')))

Output:

  C Category   C Name SW1 Contain  SW1 Type SW2 Contain  SW2 Type
0       Hull    Trunk           ?       NaN           ?       NaN
1       Hull     Roof           ?       NaN           ?       NaN
2       Hull  Ceiling         NaN       NaN           ?       NaN
3   Mid High    Floor           ?       NaN           ?       NaN
4   Mid High    Plank         NaN       NaN           ?       NaN
5  Mid High2     Fuel           ?       NaN           ?       NaN
6      Lower     Tank           ?       NaN         NaN       NaN
7     Lower2     Axel           ?       NaN         NaN       NaN

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1