'Group by don't work after used for third time
I don't know what I am doing wrong, but my dataframe does not groupby as I would expect it to do:
This is actual result of my script, but i wanted to fourth to be also grouped - see images below
For some reason it works fine for two first columns, but struggles lately and I don't see thing that could be wrong. Thank you all for your help.
Here is source-code.
import pandas as pd
data = [('AAA1', 'BBB1', 'XXX1', 46.0, 'YYY1'), ('AAA1', 'BBB1', 'XXX2', 1.0, 'YYY3'),
('AAA1', 'BBB1', 'XXX2', 2.0, 'YYY1'), ('AAA1', 'BBB1', 'XXX2', 2.0, 'DDD'),
('AAA1', 'BBB1', 'XXX5', 3.0, 'YYY6'), ('AAA1', 'BBB1', 'XXX6', 3.0, 'YYY1'),
('AAA1', 'BBB1', 'XXX4', 10.0, 'YYY1'), ('AAA1', 'BBB1', 'XXX3', 24.0, 'YYY1'),
('AAA1', 'BBB1', 'XXX5', 4.0, 'YYY89'), ('AAA1', 'BBB1', 'XXX2', 8.0, 'YYY6'),
('AAA1', 'BBB1', 'XXX5', 26.0, 'YYY1'), ('AAA1', 'BBB2', 'XXX2', 1.0, 'DDD'),
('AAA1', 'BBB2', 'XXX5', 1.5, 'YYY3'), ('AAA1', 'BBB2', 'XXX5', 12.0, 'YYY6'),
('AAA10', 'BBB42', 'XXX2', 1.0, 'YYY3'), ('AAA10', 'BBB42', 'XXX2', 2.0, 'YYY89'),
('AAA10', 'BBB42', 'XXX5', 7.0, 'YYY3'), ('AAA12', 'BBB20', 'XXX5', 3.5, 'YYY3'),
('AAA12', 'BBB52', 'XXX4', 8.0, 'YYY3'), ('AAA13', 'BBB21', 'XXX4', 3.0, 'YYY3'),
('AAA13', 'BBB23', 'XXX4', 3.0, 'YYY3'), ('AAA13', 'BBB23', 'XXX5', 1.0, 'YYY6'),
('AAA13', 'BBB23', 'XXX5', 2.0, 'YYY3'), ('AAA13', 'BBB24', 'XXX2', 6.5, 'YYY11'),
('AAA13', 'BBB24', 'XXX2', 7.0, 'YYY10'), ('AAA13', 'BBB24', 'XXX5', 2.0, 'YYY3'),
('AAA13', 'BBB24', 'XXX5', 5.0, 'YYY3'), ('AAA13', 'BBB65', 'XXX5', 8.0, 'YYY19'),
('AAA14', 'BBB26', 'XXX2', 14.0, 'YYY3'), ('AAA14', 'BBB26', 'XXX6', 4.0, 'YYY3'),
('AAA14', 'BBB77', 'XXX2', 3.0, 'YYY19'), ('AAA14', 'BBB77', 'XXX2', 19.5, 'YYY3'),
('AAA15', 'BBB30', 'XXX4', 1.0, 'YYY3'), ('AAA15', 'BBB30', 'XXX3', 8.0, 'YYY3'),
('AAA15', 'BBB30', 'XXX5', 1.0, 'YYY3'), ('AAA16', 'BBB33', 'XXX6', 0.5, 'YYY3'),
('AAA17', 'BBB57', 'XXX4', 4.0, 'YYY3'), ('AAA18', 'BBB36', 'XXX4', 1.0, 'YYY3'),
('AAA2', 'BBB61', 'XXX2', 1.0, 'YYY3'), ('AAA2', 'BBB61', 'XXX4', 4.0, 'YYY3'),
('AAA32', 'BBB76', 'XXX4', 1.0, 'YYY3'), ('AAA32', 'BBB76', 'XXX3', 16.0, 'YYY3'),
('AAA6', 'BBB15', 'XXX3', 8.0, 'YYY6'), ('AAA7', 'BBB10', 'XXX6', 51.0, 'YYY3'),
('AAA7', 'BBB12', 'XXX5', 8.0, 'YYY3'), ('AAA29', 'BBB38', 'XXX4', 12.0, 'YYY3'),
('AAA18', 'BBB40', 'XXX1', 16.0, 'YYY3')]
df = pd.DataFrame(data,
columns=["first", "second", "third", "number", "fourth"])
df["first_numbers"] = df.groupby("first")["number"].transform("sum")
df["second_numbers"] = df.groupby(["first", "second"])["number"].transform("sum")
df["third_numbers"] = df.groupby(["first", "second", "fourth"])["number"].transform("sum")
df.set_index(["first", "first_numbers", "second",
"second_numbers", "fourth", "third_numbers", "third"], inplace=True)
xlsx_writer = pd.ExcelWriter("new.xlsx", engine="xlsxwriter")
df.to_excel(excel_writer=xlsx_writer,
sheet_name="name",
index_label=["First", "First Number", "Second", "Second Number", "Fourth", "Fourth Number", "Third", "Third Number"],
engine="xlsxwriter",
startrow=0)
xlsx_writer.save()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
