'How can I plot specific Excel data from two columns with conditions?

I have a huge spreadsheet of data that looks something like this:

Date IDNumber Item
2021-05-10 1 Apple
2021-05-10 1 Orange
2021-05-10 2 Apple
2021-05-10 2 Grape
2021-06-10 5 Apple
2021-06-10 5 Grape
2021-06-10 5 Apple
2021-07-10 7 Pear
2021-07-10 7 Orange

I would like to use Python to parse/sort/plot these data where I can plot this as number of a specific item type for a specific ID number as a function of time. I'm envisioning a bar/scatter plot with the x-axis being time as month (but being able to change this time period would be key) and y-axis amount of each item.

Example plot

Really it would be awesome if I could pull out this information and put it into a new Excel sheet with new columns of date and item frequency.

Currently I'm using this code:

import numpy as np
import pandas as pd
import re
import matplotlib.pyplot as plt

df = pd.read_excel('ExcelExample.xlsx',
                    usecols=("A:E"), sheet_name=('Data'))

df_Region1 = df.loc[(((df)["StateNumber"].isin([1]) & (df["CountyNumber"].isin([5,6]))) | ((df)["StateNumber"].isin([3]) & (df["CountyNumber"].isin([8]))))]

The last line of code is going from HUGE spreadsheet to a smaller one shown in my example table.



Solution 1:[1]

My recommendaion is to filter your DataFrame based on your desired date range and count the unique items. Something like:

x = []
start_date = '2021-05-10'
end_date= '2021-05-20'
for item in df.item.unique():
    x.append(len(df[ (df['date'] >= start_date) & (df['date'] <= end_date)  & (df['item'] == item)]))
plt.plot(df.item.unique(), x, 'o')

Don't think that's the best way to do it, but I think it'll work in your case. Ypu can see documentation for pd.value_counts and pd.groupby. Also make sure you have already converted the date string to a date format using pd.to_datetime. Hope this clear things up.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1