'dataframe transformation python products atrributes values

I have an excel file with products like this below. Is it possible to align the same kind of attributes to same column using python?

I have this

category name 1 2 3 4 5 6
board games board game 1 kind family box color red weight 0.7
board games board game 2 kind card game box color blue lenght 25
board games board game 3 box color green weight 0.5 lenght 32

Desired output

category name kind box color weight length
board games board game1 family games red 0.7
board games board game2 card games blue 25
board games board game3 green 0.5 32

i have an other case with duplicate values.

category name 1 2 3 4 5 6 7 8 9 10
smartphones samsung1 sim type dual color green ram 4GB Storage 128GB year 2021
smartphones samsung2 sim type dual color green ram 4GB storage 256GB year 2021
smartphones xiaomi3 sim type dual color blue ram 6GB length 32mm Storage 128GB

desired output:

category name sim type color ram Storage length year
smartphones samsung1 dual green 4GB 128GB nan 2021
smartphones samsung2 dual green 4GB 256GB nan 2021
smartphones xiaomi3 dual blue 6GB 128GB 32mm nan

error message:

ValueError: Index contains duplicate entries, cannot reshape

my code

the error

UPDATE_2:

Index contains duplicate entries, cannot reshape

category name 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
smartphones samsung1 sim type dual color green ram 4GB Storage 128GB year 2021 camera yes fingerprint yes accelometer yes gyroscope yes NFC yes
smartphones samsung2 sim type dual color green ram 4GB storage 256GB year 2021 camera yes fingerprint yes
smartphones xiaomi3 sim type dual color blue ram 6GB length 32mm Storage 128GB camera yes fingerprint yes

desired output:

category name sim type color ram Storage length year camera fingerprint accelometer gyroscope NFC
smartphones samsung1 dual green 4GB 128GB nan 2021 yes yes yes yes no
smartphones samsung2 dual green 4GB 256GB nan 2021 yes yes nan nan nan
smartphones xiaomi3 dual blue 6GB 128GB 32mm nan yes yes nan nan nan

error 3



Solution 1:[1]

Update

To solve duplicate issue, you can try:

out = df.set_index(['category', 'name'], append=True).melt(ignore_index=False)
m = out['variable'].astype(int) % 2 == 1
out = (pd.concat([out.loc[m, 'value'].rename('variable'),
                  out.loc[~m, 'value']], axis=1)
         .set_index('variable', append=True)['value']
         .unstack('variable').reset_index(['category', 'name'])
         .rename_axis(columns=None))

Output:

category name Storage color length ram sim type year
smartphones samsung1 128GB green nan 4GB dual 2021
smartphones samsung2 256GB green nan 4GB dual 2021
smartphones xiaomi3 128GB blue 32mm 6GB dual nan

Old answer

You can mainly use stack and unstack to rearrange your dataframe:

# Separate odd (names) and even (values) columns
out = (pd.concat([df.iloc[:, 2::2].stack().droplevel(1),
                  df.iloc[:, 3::2].stack().droplevel(1)], axis=1)
         .set_index(0, append=True)[1].unstack(0))

out = pd.concat([df.iloc[:, :2], out], axis=1)

Output:

category name box color kind lenght weight
board games board game 1 red family nan 0.7
board games board game 2 blue card game 25 nan
board games board game 3 green nan 32 0.5

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1