'How can I group and reformat a dataframe based on a column?
This is my dataframe:
cardio variable value
0 0 cholesterol 0
1 1 cholesterol 1
2 1 cholesterol 1
3 1 cholesterol 0
4 0 cholesterol 0
... ... ... ...
419995 0 overweight 1
419996 1 overweight 1
419997 1 overweight 1
419998 1 overweight 1
419999 0 overweight 0
How can I split and group it based on the value of the "cardio" column and also get the counts? Like so:
cardio variable value total
0 0 active 0 6378
1 0 active 1 28643
2 0 alco 0 33080
...
cardio variable value total
21 1 overweight 1 24440
22 1 smoke 0 32050
23 1 smoke 1 2929
Solution 1:[1]
Let's try a dict comprehension :
The idea is to first create a group for each cardio group df.groupby('cardio').
Then apply your operation on each group, in this instance size() and return it to its own dataframe.
We use a dictionary to hold the various dataframes in a single container as opposed to disparate variables.
data_dict = {
f"cardio_{cardio}": data.groupby(["variable", "value"]).size().reset_index(name='counts')
for cardio, data in df.groupby("cardio")
}
data_dict['cardio_0']
variable value counts
0 cholesterol 0 2
1 overweight 0 1
2 overweight 1 1
data_dict['cardio_1']
variable value counts
0 cholesterol 0 1
1 cholesterol 1 2
2 overweight 1 3
Solution 2:[2]
Sorting by the values of cardio column:
df = pd.read_csv('../')
df = df.sort_values(by=cardio)
Splitting the dataframe by the values of cardio:
df_cardio_0 = df[df['cardio']==0]
df_cardio_1 = df[df['cardio']==1]
Solution 3:[3]
You can write a function to concatenate the filter frame you want from your split column.
Def Group(column, Frame):
Return PD.concat([Frame[Frame[column]==0], Frame[Frame[column]==1]
Then pass your dataframe to this function.
Group("cardio", df)
Solution 4:[4]
You can try this:
df0 = df.query('cardio==0').groupby(['cardio','variable','value']).size().reset_index()
print(df0)
df1 = df.query('cardio==1').groupby(['cardio','variable','value']).size().reset_index()
print(df1)
Solution 5:[5]
If your dataframe is df_1 you can use groupby to reshape your dataframe
df_2 = df_1.groupby(['cardio','variable','value']).size().reset_index(name='counts')
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | halfer |
| Solution 2 | Arnab |
| Solution 3 | David Buck |
| Solution 4 | |
| Solution 5 | Ksam |
