'Pandas: Restucture a dataframe to column values
I have the following dataframe where the cities are columns and ages are the values:
| City1 | City2 | City3 |
|---|---|---|
| 2 | 14 | 61 |
| 51 | 73 | 35 |
| 42 | 38 | 13 |
| 12 | 75 | 24 |
| 27 | 42 | 78 |
I want to create a new dataframe where the columns are age groups, and the cities are the index, like so:
| 0-20 | 20-40 | 40-60 | 60-80 | |
|---|---|---|---|---|
| City1 | 2 | 1 | 1 | 0 |
| City2 | 1 | 1 | 1 | 0 |
| City3 | 1 | 2 | 0 | 2 |
Is this possible to do in pandas?
Solution 1:[1]
Try this, using pd.cut:
dfc = pd.cut(df.rename_axis('Cities', axis=1).stack(),
bins=[-np.inf,20,40,60,np.inf],
labels='0-20 20-40 40-60 60-80'.split(' ')).reset_index()
pd.crosstab(dfc['Cities'], dfc[0]).reset_index()
Output:
0 Cities 0-20 20-40 40-60 60-80
0 City1 2 1 2 0
1 City2 1 1 1 2
2 City3 1 2 0 2
Solution 2:[2]
Here is a solution using pd.Series.between for all combinations of the range and the citys.
new_data = []
for city in df.columns:
new_city = []
for left, right in [(0,20),(20,40),(40,60),(60,80)]:
new_city.append(df[city].between(left,right, inclusive="left").sum())
new_data.append(new_city)
new_df = pd.DataFrame(new_data, columns=["0-20","20-40","40-60","60-80"], index=[df.columns])
new_df
Solution 3:[3]
#this should work
import pandas as pd
#creating df
data = [[2, 14, 61], [51, 73, 35], [42, 38, 13], [12, 75, 24], [27, 42, 78]]
df = pd.DataFrame(data, columns = ['city1', 'city2', 'city3'])
#sorting by given intervals
data_new = [[df[(df['city1'] > 0) & (df['city1'] <= 20)]['city1'].count(), df[(df['city1'] > 20) & (df['city1'] <= 40)]['city1'].count(), df[(df['city1'] > 40) & (df['city1'] <= 60)]['city1'].count(), df[(df['city1'] > 60) & (df['city1'] <= 80)]['city1'].count()], [df[(df['city2'] > 0) & (df['city2'] <= 20)]['city2'].count(), df[(df['city2'] > 20) & (df['city2'] <= 40)]['city2'].count(), df[(df['city2'] > 40) & (df['city2'] <= 60)]['city2'].count(), df[(df['city2'] > 60) & (df['city2'] <= 80)]['city2'].count()], [df[(df['city3'] > 0) & (df['city3'] <= 20)]['city3'].count(),df[(df['city3'] > 20) & (df['city3'] <= 40)]['city3'].count(), df[(df['city3'] > 40) & (df['city3'] <= 60)]['city3'].count(), df[(df['city3'] > 60) & (df['city3'] <= 80)]['city3'].count()]]
#creating a new df with new data
df_new = pd.DataFrame(data_new, index= ['city1', 'city2', 'city3'], columns= ['0-20', '20-40', '40-60', '60-80'])
#so the point is to add this "index= ['city1', 'city2', 'city3']," between data and columns when you create a new dataframe
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Scott Boston |
| Solution 2 | mosc9575 |
| Solution 3 | Chris Tang |
