'Pandas - how to "group by" and then add up string
In pandas, I'm looking to "group by" a value in column A, by adding up strings in column B. Additionally, I'd like the grouping to happen only when identical values appear in the same group. If there is a break in that value, then I am looking for the grouping to start again.
Ideally I would like to do this without looping.
Not sure where to start with this. Does anyone have a suggestion for the best pandas function to work with?
Here is an example. I want to transform this:
'A' 'B'
0 faa hello
1 faa there
2 foo hi
3 faa how
4 faa are
5 faa you
6 foo i am well
7 foo thank you
Into this:
'A' 'B'
0 faa hello there
2 foo hi
3 faa how are you
6 foo i am well thank you
Solution 1:[1]
The "If there is a break in that value, then I am looking for the grouping to start again." is a bit tricky -- we acomplish that with a special groupby condition:
df.groupby((df['A'] != df['A'].shift()).cumsum()).agg({'A':'first', 'B':' '.join})
output:
A B
A
1 faa hello there
2 foo hi
3 faa how are you
4 foo i am well thank you
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | piterbarg |
