'How can I plot a pandas dataframe where x = month and y = frequency of text?

I have the following dataset:

Date ID Fruit
2021-2-2 1 Apple
2021-2-2 1 Pear
2021-2-2 1 Apple
2021-2-2 2 Pear
2021-2-2 2 Pear
2021-2-2 2 Apple
2021-3-2 3 Apple
2021-3-2 3 Apple

I have removed duplicate "Fruit" based on ID (There can only be 1 apple per ID number but multiple apples per a single month). And now I would like to generate multiple scatter/line plots (one per "Fruit" type) with the x-axis as month (i.e. Jan. 2021, Feb. 2021, Mar. 2021, etc) and the y-axis as frequency or counts of "Fruit" that occur in that month.

If I could generate new columns in a new sheet in Excel that I could then plot as x and y that would be great too. Something like this for Apples specifically:

Month Number of Apples
Jan 2021 0
Feb 2021 2
Mar 2021 1

I've tried the following which let me remove duplicates but I can't figure out how to count the number of Apples in the Fruit column that occur within a given timeframe (month is what I'm looking for now) and set that to the y-axis.

import numpy as np
import pandas as pd
import re
import datetime as dt
import matplotlib.pyplot as plt
import seaborn as sns

df = pd.read_excel('FruitExample.xlsx',
                    usecols=("A:E"), sheet_name=('Data'))


df_Example = df.drop_duplicates(subset=["ID Number", "Fruit"], keep="first")

df_Example.plot(x="Date", y=count("Fruit"), style="o")
plt.show()

I've tried to use groupby and categorical but can't seem to count this up properly and plot it. Here is an example of a plot that would be great.

[Example Plot]



Solution 1:[1]

Make sure the dates are in datetime format

df['Date']=pd.to_datetime(df['Date'])

Then create a column for month-year,

df['Month-Year']=df['Date'].dt.to_period('M') #M for month
new_df=pd.DataFrame(df.groupby(['Month-Year','Fruit'])['ID'].count())
new_df.reset_index(inplace=True)

Make sure to change back to datetime as seaborn can't handle 'period' type

new_df['Month-Year']=new_df['Month-Year'].apply(lambda x: x.to_timestamp())

Then plot,

import seaborn as sns
sns.lineplot(x='Month-Year',y='ID',data=new_df,hue='Fruit')

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Reza Rahemtola