'pandas merge rows with overlapping column names using default non-NaN value

There is a list of tables and some configurations, specified in csv format.

I want to use default configuration from configs, and override it in tables if needed

configs = """  
schema|new_name
dbo|{table}
qa|test_{table}
"""

tables = """  
table|schema|new_name
employee|hr|{schema}_{table}
advertisers
users
"""

configs = pandas.read_csv(io.StringIO(configs), sep='|')  
tables = pandas.read_csv(io.StringIO(tables), sep='|')

I want to cross-join/merge/concatenate/combine them to get a dataframe which contains:

final = """
table|schema|new_name

employee|hr|{schema}_{table}
advertisers|dbo|{table}
users|dbo|{table}

employee|hr|{schema}_{table}
advertisers|qa|test_{table}
users|qa|test_{table}
"""

If schema is not specified, use 'dbo' schema and 'users/advertisers' table name.
If schema is specified, use 'hr' schema and 'hr_employee' table name.

Basically - when horizontal concat of 2 rows with overlapping column names, create one column using whichever value is not NaN.

What pandas command should I use ?



Solution 1:[1]

EDIT:

#cross join both DataFrames
df = tables.merge(configs, suffixes=('','_'), how='cross')

#get columns with suffix _
cols = df.columns[df.columns.str.endswith('_')]
#remove suffix
new = cols.str.strip('_')
#replace missing values from cols by _ colums
df[new] = df[new].fillna(df[cols].rename(columns=lambda x: x.strip('_')))
#remove columns with _
df = df.drop(cols, axis=1)
print (df)
         table schema          new_name
0     employee     hr  {schema}_{table}
1     employee     hr  {schema}_{table}
2  advertisers    dbo           {table}
3  advertisers     qa      test_{table}
4        users    dbo           {table}
5        users     qa      test_{table}

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1