'Visualise grouped line plots in pyspark
I have this DF(sample) and I am using PySpark in Databricks. I would like to have line plot DATE vs BALANCE but for each ID in one single frame.
+-------------+-------------+----------------+
| DATE| BALANCE| ID|
+-------------+-------------+----------------+
| 2021-07-01| 81119.73| Ax3838J|
| 2021-07-02| 81119.73| Ax3838J|
| 2021-07-03| 81119.73| Ax3838J|
| 2021-07-04| 81289.62| Ax3838J|
| 2021-07-05| 81385.62| Ax3838J|
| 2021-07-02| 81249.76| Bz3838J|
| 2021-07-03| 81249.76| Bz3838J|
| 2021-07-04| 81249.76| Bz3838J|
| 2021-07-05| 81324.28| Bz3838J|
| 2021-07-06| 81329.28| Bz3838J|
+-------------+-------------+----------------+
I can plot for one single ID but I have more than 10000 unique IDs. How can I visualise multiple line plots segmented based on ID. Also, Is there any smart ways to visualise the DF all together?
DF_single.toPandas().plot.line(x='DATE', y='BALANCE')
Note: Image is for a particular ID from the actual dataset.
Solution 1:[1]
You can pivot your pandas DataFrame in order to turn ID labels into separate columns containing, like:
(
DF_single
.toPandas()
.pivot_table(index='DATE',columns='ID',values='BALANCE')
.plot()
)
the pivot_table function aggregates the values passed in values so if your DataFrame has more than one value for each DATE/ID, you can choose the appropriate aggregation function and pass it through the parameter aggfunc (e.g.: aggfunc=np.mean or aggfunc='mean' - the default is 'mean'). From the way you posed your question, you probably have only one value per DATE/ID, so the aggfunc doesn't really matter in your case, but it's important to understand what pivot_table is doing.
Also, pandas's plot function by default plots lines, and it uses columns as different series and the index as the x-axis, so there's no need to specify anything else =)
You can check the doc for the pivot_table function here:
https://pandas.pydata.org/docs/reference/api/pandas.pivot_table.html
Hope that helps! Good luck =)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Luis Marcanth |

