'Extract certain values from different columns in dataframe [closed]

I have an excel sheet and I want to extract different values from different columns into a single columns.

desired excel sheet format

I want to figure out first of all how to deal with subheaders like astro and athens grey as well as to extract information in this patterns. Thanks

sample output

I have managed to resolve the sub header issue , Now i just want help with regex to extract information in desired format. Here is what I have done so far ,Subheaders



Solution 1:[1]

See if it helps:

import pandas as pd
data = pd.read_excel('Sample.xlsx')
data[data.isna().sum(axis=1)==6]
data = data.dropna(how='all')
import numpy as np
data['SKU'].astype(str).str.extract('([^\(\)]*)')[0].str.strip().replace('\d+', np.nan, regex = True).fillna(method='ffill')+' '+data['DESCRIPTION']+' '+data['SIZE'].str.extract('([^0-9x]+)').fillna('')[0]

Output:

enter image description here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1