'Creating multiple lines in plot grouped by selected column of larger dataframe
I have a dataframe, example sample below:
import datetime
import pandas as pd
ids =[1, 2, 3, 1, 2, 3]
vals = [3, 5, 6, 3, 7, 8]
lats = [10, 10, 10, 30, 30, 30]
ratio = [.1, .4, .2, .3, .4, .5,]
df = pd.DataFrame({'ids' : ids, 'vals' : vals, 'lats' : lats, 'ratio' : ratio})
>>>df
ids vals lats ratio
0 1 3 10 0.1
1 2 5 10 0.4
2 3 6 10 0.2
3 1 3 30 0.3
4 2 7 30 0.4
5 3 8 30 0.5
I want to create a graph with lines that have ratio on the y-axis, lats on the x-axis and are grouped by the ids column. All the questions I've found use groupby or pivot on a dataframe that is used fully, and not a selection of columns.
I need to make more graphs on my true dataframe, which has many more columns and therefore would like to know how to plot this by selecting specified columns.
Solution 1:[1]
You can use the grouby function follow by a for loop, then, use the plot function for each of the groups, passing the desired columns as x and y (in this particular order, if you wish to maintain the described plot).
import matplotlib.pyplot as plt
...
...
x_axis = 'lats' # specified columns
y_axis = 'ratio' # specified columns
groups = df.groupby('ids')
for n,g in groups:
plt.plot(g[x_axis], g[y_axis], label=f'ID-{n}')
plt.xlabel(x_axis.capitalize())
plt.ylabel(y_axis.capitalize())
plt.legend()
plt.grid(True)
plt.show()
Another way of plot Pandas dataframes columns is passing the data argument to the plot function and the name of the columns as strings:
Instead of giving the data in x and y, you can provide the object in the data parameter and just give the labels for x and y
But here, you would still have to pass the dataframe group on each iteration
plt.plot('lats', 'ratio', data=g)

Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | n1colas.m |
