'How to add a new column from Grouper value_counts and use in line plots?

I have a dataframe imported from Excel similar to this:

Date             ID    Chemical
2021-01-01       1      water
2021-01-01       1      acid
2021-01-03       3      water
2021-03-04       5      soda
2021-03-04       5      soda
2021-05-03       6      water
2021-05-03       6      soda
2021-05-05       8      soda

I am trying to plot up a series of lineplots (1 per chemical type) which plots the counts of that chemical per month as a function of time (counts on y axis, time (months) on x axis). So I think I want the above table to look like this:

Chemical    Date          Count 
water     2021-01-31       2      
          2021-03-31       0      
          2021-05-31       1      
acid      2021-01-31       1      
          2021-03-31       0      
          2021-05-31       0     
soda      2021-01-31       0      
          2021-03-31       2      
          2021-05-31       2

So far I've managed to remove duplicates for the same ID number (not shown in my example) and I've got my data to look like the above but missing the "Count" heading. This has made it so I can't set the y-axis to "Count" for plotting purposes.

This is my code I've tried so far:

import numpy as np
import pandas as pd
import re
import datetime as dt
import matplotlib.pyplot as plt
import seaborn as sns


df = pd.read_excel('Example.xlsx',
                    usecols=("A:F"), sheet_name=('Data'))

df_Test1 = df_Test.drop_duplicates(subset=["ID", "Chemical"], keep="first")
df_Test2 = df_Test1.copy()

df_Test2.loc[:, "Date"] = pd.to_datetime(df_Test2.loc[:, "Date"])
df_Test2["Chemical"].value_counts()
df_Test2.groupby(pd.Grouper(key="Date", freq="M"))["Chemical"].value_counts()
df_Test3 = df_Test2.groupby(["Chemical", pd.Grouper(key="Date", freq="M")])["Chemical"].value_counts()
print(df_Test3)
sns.lineplot(x="Date", y="Chemical", data=df_Test3)
plt.show()

This gives me the following output and I know the plot is wrong because I'm not sure how to set the yaxis value.

Chemical    Date          Chemical  
water     2021-01-31       water     2      
          2021-03-31       water     0      
          2021-05-31       water     1      
acid      2021-01-31       acid      1      
          2021-03-31       acid      0      
          2021-05-31       acid      0     
soda      2021-01-31       soda      0      
          2021-03-31       soda      2      
          2021-05-31       soda      2

How can I get the new count data to become a labeled column in the dataframe and plot it as a function of time? Also, is there a way to add missing months? So the chemical would plot as zero for that month?

Thank you!

Solution 1:^[1]

I think I managed to give you a result for the first part of your question: Change date to monthly period, then groupby Chemicals and monthy dates and count the IDs

df = pd.DataFrame(
    {
        "Date": [
            "2021-01-01",
            "2021-01-01",
            "2021-01-03",
            "2021-03-04",
            "2021-03-04",
            "2021-05-03",
            "2021-05-03",
            "2021-05-05",
        ],
        "ID": [1, 1, 3, 5, 5, 6, 6, 8],
        "Chemical": ["water", "acid", "water", "soda", "soda", "water", "soda", "soda"],
    }
)

df["Date"] = pd.to_datetime(df["Date"])
df["Date_month"] = df["Date"].dt.to_period("m")
out = df.groupby(["Chemical", "Date_month"])["ID"].count()

print(out)

Chemical  Date_month
acid      2021-01       1
soda      2021-03       2
          2021-05       2
water     2021-01       2
          2021-05       1
Name: ID, dtype: int64

If you want it to be a df again, just add .reset_index() at the end of out. The other part with filling the missing month with fill_value 0.....I just didn't get it done, sorry.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Rabinzel

'How to add a new column from Grouper value_counts and use in line plots?

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]