'Pandas - how to "group by" and then add up string

In pandas, I'm looking to "group by" a value in column A, by adding up strings in column B. Additionally, I'd like the grouping to happen only when identical values appear in the same group. If there is a break in that value, then I am looking for the grouping to start again.

Ideally I would like to do this without looping.

Not sure where to start with this. Does anyone have a suggestion for the best pandas function to work with?

Here is an example. I want to transform this:

    'A' 'B'
0   faa hello
1   faa there
2   foo hi
3   faa how
4   faa are
5   faa you
6   foo i am well
7   foo thank you

Into this:

    'A' 'B'
0   faa hello there
2   foo hi
3   faa how are you
6   foo i am well thank you


Solution 1:[1]

The "If there is a break in that value, then I am looking for the grouping to start again." is a bit tricky -- we acomplish that with a special groupby condition:

df.groupby((df['A'] != df['A'].shift()).cumsum()).agg({'A':'first', 'B':' '.join})

output:


    A   B
A       
1   faa hello there
2   foo hi
3   faa how are you
4   foo i am well thank you

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 piterbarg