'Matplotlib plot with x-axis as binned data and y-axis as the mean value of various variables in the bin?

My apologies if this is rather basic; I can't seem to find a good answer yet because everything refers only to histograms. I have circular data, with a degrees value as the index. I am using pd.cut() to create bins of a few degrees in order to summarize the dataset. Then, I use df.groupby() and .mean() to calculate mean values of all columns for the respective bins.

Now - I would like to plot this, with the bins on the x-axis, and lines for the columns.

I tried to iterate over the columns, adding them as:

for i in df.columns: 
     ax.plot(df.index,df[i])

However, this gives me the error: "float() argument must be a string or number, not 'pandas._libs.interval.Interval'

Therefore, I assume it wants the x-axis values to be numbers or strings and not intervals. Is there a way I can make this work? To get the dataframe containing the mean values of each variable with respect to bins, I used:

bins = np.arange(0,360,5)
df = df.groupby(pd.cut(df[Dir]),bins)).mean() 

Here is what df looks like at the point of plotting - each column includes mean values for each variable 0,1,2 etc. for each bin, which I would like plotted on y-axis, and "Dir" is the index with bins.

                        0            1            2            3          4          5
Dir                                                                        
(0, 5]          37.444135  2922.848675  3244.325904  4203.001446  36.262371  37.493497
(5, 10]         42.599494  3248.194328  3582.355759  4061.098517  36.351476  37.148341
(10, 15]        47.277694  2374.379517  2709.435714  2932.064076  36.537377  36.878293
(15, 20]        52.345712  2626.774240  2659.391040  3087.324800  36.114965  36.603918
(20, 25]        57.318976  2207.845000  2228.002353  2811.066176  36.279392  37.165979
(25, 30]        62.454386  2436.117405  2839.255696  3329.441772  36.762896  37.861577
(30, 35]        67.705955  3138.968411  3462.831977  4007.180620  36.462313  37.560977
(35, 40]        72.554786  2554.552620  2548.955581  3079.570159  36.256386  36.819579
(40, 45]        77.501479  2862.703066  2965.408491  2857.901887  36.170788  36.140976
(45, 50]        82.386679  2973.858188  2539.348967  2000.606359  36.067776  37.210645


Solution 1:[1]

We have multiple options, we can obtain the middle of the bin using as shown below. You can also access the left and right side of the bins, as described here. Let me know if you need any further help.

df = pd.DataFrame(data={'x': np.random.uniform(low=0, high=10, size=10), 'y': np.random.exponential(size=10)})
bins = range(0,360,5)
df['bin'] = pd.cut(df['x'], bins)
agg_df = df.groupby(by='bin').mean()

# this is the important step. We can obtain the interval index from the categorical input using this line.
mids = pd.IntervalIndex(agg_df.index.get_level_values('bin')).mid

# to apply for plots:
for col in df.columns:
    plt.plot(mids, df[col])

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1