'How can I plot a pandas dataframe where x = month and y = frequency of text?
I have the following dataset:
| Date | ID | Fruit |
|---|---|---|
| 2021-2-2 | 1 | Apple |
| 2021-2-2 | 1 | Pear |
| 2021-2-2 | 1 | Apple |
| 2021-2-2 | 2 | Pear |
| 2021-2-2 | 2 | Pear |
| 2021-2-2 | 2 | Apple |
| 2021-3-2 | 3 | Apple |
| 2021-3-2 | 3 | Apple |
I have removed duplicate "Fruit" based on ID (There can only be 1 apple per ID number but multiple apples per a single month). And now I would like to generate multiple scatter/line plots (one per "Fruit" type) with the x-axis as month (i.e. Jan. 2021, Feb. 2021, Mar. 2021, etc) and the y-axis as frequency or counts of "Fruit" that occur in that month.
If I could generate new columns in a new sheet in Excel that I could then plot as x and y that would be great too. Something like this for Apples specifically:
| Month | Number of Apples |
|---|---|
| Jan 2021 | 0 |
| Feb 2021 | 2 |
| Mar 2021 | 1 |
I've tried the following which let me remove duplicates but I can't figure out how to count the number of Apples in the Fruit column that occur within a given timeframe (month is what I'm looking for now) and set that to the y-axis.
import numpy as np
import pandas as pd
import re
import datetime as dt
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_excel('FruitExample.xlsx',
usecols=("A:E"), sheet_name=('Data'))
df_Example = df.drop_duplicates(subset=["ID Number", "Fruit"], keep="first")
df_Example.plot(x="Date", y=count("Fruit"), style="o")
plt.show()
I've tried to use groupby and categorical but can't seem to count this up properly and plot it. Here is an example of a plot that would be great.
[
]
Solution 1:[1]
Make sure the dates are in datetime format
df['Date']=pd.to_datetime(df['Date'])
Then create a column for month-year,
df['Month-Year']=df['Date'].dt.to_period('M') #M for month
new_df=pd.DataFrame(df.groupby(['Month-Year','Fruit'])['ID'].count())
new_df.reset_index(inplace=True)
Make sure to change back to datetime as seaborn can't handle 'period' type
new_df['Month-Year']=new_df['Month-Year'].apply(lambda x: x.to_timestamp())
Then plot,
import seaborn as sns
sns.lineplot(x='Month-Year',y='ID',data=new_df,hue='Fruit')
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Reza Rahemtola |
