'How can I combine these database rows using Python Pandas?

Inside this .csv file, there is information on Target stores. However, each Target store has 2 accounts, you can distinguish it based on the 071 and 212 at the beginning of the Sold-to Name. You can see for example, Super Target 2169 has 2 rows. I want to clean this sheet up for report purposes and condense both Super Target 2169 rows into 1 Uniquely Identified row, that combined both dollar amounts to be able to easily see how much money that store has done overall.

I have tried iterating even though I was told not to in Pandas. I have also tried using the groupby function, but I just cannot manipulate it in the way I want it. I am not expecting someone to tell me the code, but just a point in the right direction or function in which I can start researching would be amazing.

Example Database Snippet



Solution 1:[1]

You'd want to use groupby, but omit the prefix for each store (so that the actual pair of store rows can be combined). Consider a simplified version of your dataframe:

df = pd.DataFrame({'a':['071 SPR 2169', '071 SPR 21690', '212 SPR 2169', '212 SPR 21690'], 'b':[21.21, 23.21,22.22, 33.43]})  

Just stores and sales. You can split out the store prefix and the store name like this:

df[['a1', 'a2']] = df.apply(lambda x: pd.Series([x.a.split(' ')[0], ' '.join(x.a.split(' ')[1:])]), axis=1)

Leaving you with this:

               a      b   a1         a2
0   071 SPR 2169  21.21  071   SPR 2169
1  071 SPR 21690  23.21  071  SPR 21690
2   212 SPR 2169  22.22  212   SPR 2169
3  212 SPR 21690  33.43  212  SPR 21690

Then, groupby on the store name only, using aggregations to sum your sales:

total_df = df.groupby('a2')['b'].sum()

Giving you:

a2
SPR 2169     43.43
SPR 21690    56.64

If you need to include your store information, you can use aggregations with something like:

total_df = df.groupby('a2').agg({'b':'sum', 'city':'first', 'state':'first'})




                                                                                         

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 AlecZ