'How to plot some datasets in pandas based on different thresholds in python

I have a data frame that has 3 columns and I want to plot a line graph based on some thresholds. Here is the data frame

      date      income  ratio
0   2022-03-02  37175   48.79
1   2022-02-03  37740   52.18
2   2022-01-03  40280   51.33
3   2021-12-01  40310   61.45
4   2021-11-01  39916   55.92
5   2021-10-01  39917   60.54
6   2021-09-01  42009   65.13
7   2021-08-02  43673   72.2
8   2021-07-01  43880   74.29
9   2021-06-01  43954   80.13
10  2021-05-03  43511   78.83


dataframe['date'] = pd.to_datetime(dataframe['date'])
print(type(dataframe.date[0]))
newdata = dataframe.sort_values(by = 'date', ascending = True)
newdata
dataframe.plot(x='date', y='income', cmap = 'Accent')
for income in range(0,len(dataframe['income'])-1):
    if(dataframe['income'][income]<=dataframe['income'][income+1]):
        print(f' The next item in the data frame {income+1} is increasing')
    else:
        print(f' The next item in the data frame {income+1} is fluctuating')

First of all, I want to arrange them in ascending order based on the dates and then plot a graph to show the condition in (2):

  1. For the date where there is fluctuation like t1, t2, t3, ...,tn if t2 is less than t1 I should mark point t2 with red color
  2. Minor issue if I sort them based on the date the index starts from the highest to the lowest one how can I maintain the lowest interval to start from 0, 1, 2, 3, 4..., 10 and not 10, 9, 8, 7, 6, ...,2, 1, 0?


Solution 1:[1]

Here's a way to get a graph of income vs. date with decreasing values in red and increasing/static values in blue. Assume this code completely replaces the entire block of code in your original question.

# Convert 'date' column to datetime format. I'm assuming your 'income' and 'ratio' columns are already floats.
dataframe['date'] = pd.to_datetime(dataframe['date'])

# Sort by date and change indices to match.
dataframe = dataframe.sort_values(by = 'date', ascending = True).reset_index(drop=True)

# Get differences between consecutive incomes, with 0 as the income_diff for the very first row. Add this as a column in dataframe.
income = dataframe["income"].to_numpy().astype(float)
income_diffs = np.insert(np.diff(income), 0, 0)
dataframe["income_diffs"] = income_diffs

# Rows with 0 or positive income diffs are stored in pos_diff.
# All other rows are stored in neg_diff.
pos_diff = dataframe[dataframe["income_diffs"] >= 0]
idx = pos_diff.index.values
neg_diff = dataframe.drop(idx, axis=0)

# Plot the dates and incomes from pos_diff in blue.
plt.scatter(pos_diff["date"], pos_diff["income"], color="b", label="Increase")

# Plot the dates and incomes from neg_diff in red. These are the fluctuating values.
plt.scatter(neg_diff["date"], neg_diff["income"], color="r", label="Fluctuation")

# Some stuff to prettify the plot.
plt.xlabel("Date", labelpad = 15)
plt.ylabel("Income ($)", labelpad = 10)
plt.title("Income Fluctuations Over Time")

plt.xticks(rotation = 45)
plt.legend(loc = "lower left", frameon=False)

The above code gives this plot: Income plot

Let me know if you have any questions.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 AJH