'Transform file with multiheader, fillna and different formats

I have some Excel file with multiheader which requires some advanced steps on reading and cleaning.

enter image description here

  1. The file could come with some extra lines above header(some files have row with "Sales Report" text, some - don't). So the script should automatically understand what is the positions of the headers (in some cases raws 4-7, in some cases - 3-6).
  2. There are missing values in column names (next to material group and material). The script should fill those missing values with Material Group Name and Material Name respectively. Or in general - name of the column on the left + " Name" substring.
  3. There are missing values in those columns, which should be filled with "ffill" method.
  4. And finally - the file should be transformed to flat table style, so that final list of columns are:

Material Group | Material Group Name | Brand | Material | Material Name | Ean/UPC | Cal year month | Plant | Plant Name | Sales Qty | Sales at Retail with Tax

It looks, that this task requires advanced knowledge of Multiheader, so that we don't transform data using pivot table. The data (numbers) should be taken as it is, so that's why I am reading with dtype=str parameter.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source