'Python Dataframe Groupby function not producing expected output
I have been struggling with this for several days, this is my last resort.
Here is an example of my data
| PRODUCT NAME | PRICE | LINK | INSTOCK | TAGS |
|----------------|-------|-----------|---------|--------|
| the best shirt | 1.00 | www.alink | true | cotton |
| the best shirt | 1.00 | www.alink | true | yellow |
| the best pants | 2.00 | www.alink | true | denim |
and here is what I would like
| PRODUCT NAME | PRICE | LINK | INSTOCK | TAGS |
|----------------|-------|-----------|---------|----------------|
| the best shirt | 1.00 | www.alink | true | cotton, yellow |
| the best pants | 2.00 | www.alink | true | denim |
this is the code I am using
df = df.groupby(['PRODUCT NAME', 'PRICE', 'LINK', 'INSTOCK'])[['TAGS']].apply(', '.join)
and this is what I get
| 0 |
|------|
| TAGS |
I am completely at a loss of what could be causing this. Thanks in advance <3
Solution 1:[1]
Looking at the output, I would assume your problem stems from having NaNs in your data .
Referring to your example, for me this approach works:
products = {'PRODUCT NAME': ['the best shirt', 'the best shirt', 'the best pants'],
'PRICE': [1.00, 1.00, 2.00],
'LINK': ['link', 'link', 'link'],
'INSTOCK': [True, True, True],
'TAGS': ['cotton', 'yellow', 'denim']}
df = pd.DataFrame(products)
df = df.groupby(['PRODUCT NAME','PRICE','LINK','INSTOCK']).TAGS.apply(lambda x: ', '.join(x)).reset_index()
df
Your result
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | LuckyLuke |
