'Ranking date column in pyspark
I have the following data frame in pyspark:
>>> df.show()
+----------+------+
| date_col|counts|
+----------+------+
|2022-02-05|350647|
|2022-02-06|313091|
+----------+------+
I want to create a resultant data frame which ranks the date_col in increasing order:
>>> df.show()
+----------+------+---------+
| date_col|counts|order_col|
+----------+------+---------+
|2022-02-05|350647| 2|
|2022-02-06|313091| 1|
+----------+------+---------+
How can we achieve this?
Following script can be used to created dataframe df:
from datetime import datetime, date
from pyspark.sql import Row
from pyspark.sql import SparkSession
df = spark.createDataFrame([
Row(date_col=date(2022, 02, 05), count=350647 ),
Row(date_col=date(2022, 02, 06), count=313091 ),
])
df.show()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
