'How to plot some datasets in pandas based on different thresholds in python
I have a data frame that has 3 columns and I want to plot a line graph based on some thresholds. Here is the data frame
date income ratio
0 2022-03-02 37175 48.79
1 2022-02-03 37740 52.18
2 2022-01-03 40280 51.33
3 2021-12-01 40310 61.45
4 2021-11-01 39916 55.92
5 2021-10-01 39917 60.54
6 2021-09-01 42009 65.13
7 2021-08-02 43673 72.2
8 2021-07-01 43880 74.29
9 2021-06-01 43954 80.13
10 2021-05-03 43511 78.83
dataframe['date'] = pd.to_datetime(dataframe['date'])
print(type(dataframe.date[0]))
newdata = dataframe.sort_values(by = 'date', ascending = True)
newdata
dataframe.plot(x='date', y='income', cmap = 'Accent')
for income in range(0,len(dataframe['income'])-1):
if(dataframe['income'][income]<=dataframe['income'][income+1]):
print(f' The next item in the data frame {income+1} is increasing')
else:
print(f' The next item in the data frame {income+1} is fluctuating')
First of all, I want to arrange them in ascending order based on the dates and then plot a graph to show the condition in (2):
- For the date where there is fluctuation like
t1, t2, t3, ...,tn
ift2
is less thant1
I should mark pointt2
withred
color - Minor issue if I sort them based on the date the index starts from the highest to the lowest one how can I maintain the lowest interval to start from
0, 1, 2, 3, 4..., 10
and not10, 9, 8, 7, 6, ...,2, 1, 0
?
Solution 1:[1]
Here's a way to get a graph of income vs. date with decreasing values in red and increasing/static values in blue. Assume this code completely replaces the entire block of code in your original question.
# Convert 'date' column to datetime format. I'm assuming your 'income' and 'ratio' columns are already floats.
dataframe['date'] = pd.to_datetime(dataframe['date'])
# Sort by date and change indices to match.
dataframe = dataframe.sort_values(by = 'date', ascending = True).reset_index(drop=True)
# Get differences between consecutive incomes, with 0 as the income_diff for the very first row. Add this as a column in dataframe.
income = dataframe["income"].to_numpy().astype(float)
income_diffs = np.insert(np.diff(income), 0, 0)
dataframe["income_diffs"] = income_diffs
# Rows with 0 or positive income diffs are stored in pos_diff.
# All other rows are stored in neg_diff.
pos_diff = dataframe[dataframe["income_diffs"] >= 0]
idx = pos_diff.index.values
neg_diff = dataframe.drop(idx, axis=0)
# Plot the dates and incomes from pos_diff in blue.
plt.scatter(pos_diff["date"], pos_diff["income"], color="b", label="Increase")
# Plot the dates and incomes from neg_diff in red. These are the fluctuating values.
plt.scatter(neg_diff["date"], neg_diff["income"], color="r", label="Fluctuation")
# Some stuff to prettify the plot.
plt.xlabel("Date", labelpad = 15)
plt.ylabel("Income ($)", labelpad = 10)
plt.title("Income Fluctuations Over Time")
plt.xticks(rotation = 45)
plt.legend(loc = "lower left", frameon=False)
The above code gives this plot:
Let me know if you have any questions.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | AJH |