'Python Dataframe Groupby function not producing expected output

I have been struggling with this for several days, this is my last resort.

Here is an example of my data

| PRODUCT NAME   | PRICE | LINK      | INSTOCK | TAGS   |
|----------------|-------|-----------|---------|--------|
| the best shirt | 1.00  | www.alink | true    | cotton |
| the best shirt | 1.00  | www.alink | true    | yellow |
| the best pants | 2.00  | www.alink | true    | denim  |

and here is what I would like

| PRODUCT NAME   | PRICE | LINK      | INSTOCK | TAGS           |
|----------------|-------|-----------|---------|----------------|
| the best shirt | 1.00  | www.alink | true    | cotton, yellow |
| the best pants | 2.00  | www.alink | true    | denim          |

this is the code I am using

df = df.groupby(['PRODUCT NAME', 'PRICE', 'LINK', 'INSTOCK'])[['TAGS']].apply(', '.join)

and this is what I get

| 0    |
|------|
| TAGS |

I am completely at a loss of what could be causing this. Thanks in advance <3



Solution 1:[1]

Looking at the output, I would assume your problem stems from having NaNs in your data .

Referring to your example, for me this approach works:

products = {'PRODUCT NAME': ['the best shirt', 'the best shirt', 'the best pants'],
            'PRICE': [1.00, 1.00, 2.00],
            'LINK': ['link', 'link', 'link'],
            'INSTOCK': [True, True, True],
            'TAGS': ['cotton', 'yellow', 'denim']}
df = pd.DataFrame(products)
df = df.groupby(['PRODUCT NAME','PRICE','LINK','INSTOCK']).TAGS.apply(lambda x: ', '.join(x)).reset_index()
df

Your result

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 LuckyLuke