'How can I plot specific Excel data from two columns with conditions?
I have a huge spreadsheet of data that looks something like this:
| Date | IDNumber | Item |
|---|---|---|
| 2021-05-10 | 1 | Apple |
| 2021-05-10 | 1 | Orange |
| 2021-05-10 | 2 | Apple |
| 2021-05-10 | 2 | Grape |
| 2021-06-10 | 5 | Apple |
| 2021-06-10 | 5 | Grape |
| 2021-06-10 | 5 | Apple |
| 2021-07-10 | 7 | Pear |
| 2021-07-10 | 7 | Orange |
I would like to use Python to parse/sort/plot these data where I can plot this as number of a specific item type for a specific ID number as a function of time. I'm envisioning a bar/scatter plot with the x-axis being time as month (but being able to change this time period would be key) and y-axis amount of each item.
Really it would be awesome if I could pull out this information and put it into a new Excel sheet with new columns of date and item frequency.
Currently I'm using this code:
import numpy as np
import pandas as pd
import re
import matplotlib.pyplot as plt
df = pd.read_excel('ExcelExample.xlsx',
usecols=("A:E"), sheet_name=('Data'))
df_Region1 = df.loc[(((df)["StateNumber"].isin([1]) & (df["CountyNumber"].isin([5,6]))) | ((df)["StateNumber"].isin([3]) & (df["CountyNumber"].isin([8]))))]
The last line of code is going from HUGE spreadsheet to a smaller one shown in my example table.
Solution 1:[1]
My recommendaion is to filter your DataFrame based on your desired date range and count the unique items. Something like:
x = []
start_date = '2021-05-10'
end_date= '2021-05-20'
for item in df.item.unique():
x.append(len(df[ (df['date'] >= start_date) & (df['date'] <= end_date) & (df['item'] == item)]))
plt.plot(df.item.unique(), x, 'o')
Don't think that's the best way to do it, but I think it'll work in your case. Ypu can see documentation for pd.value_counts and pd.groupby. Also make sure you have already converted the date string to a date format using pd.to_datetime. Hope this clear things up.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
