'Get first product ordered by customer
I have an existing table like below. I want to replace the NULLs in first_product column with the first product a customer has ordered.
INPUT
| customer_id | product | order_date_id | first_product |
|---|---|---|---|
| C0001 | apple | 20220224 | NULL |
| C0001 | pear | 20220101 | NULL |
| C0002 | strawberry | 20220224 | NULL |
| C0001 | apple | 20220206 | NULL |
OUTPUT:
| customer_id | product | order_date_id | first_product |
|---|---|---|---|
| C0001 | apple | 20220224 | pear |
| C0001 | pear | 20220101 | pear |
| C0002 | strawberry | 20220224 | strawberry |
| C0001 | apple | 20220206 | pear |
I have thought about using row numbers as below, but not sure how to pull it all together. I have this code so far, but not sure how to update the first_product column using the below code.
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY customer_id, order_date_id) AS first_occurrance
Some pseudo-code:
REPLACE first_product FROM table WITH product WHERE
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY customer_id, order_date_id) AS first_occurrance = 1
Solution 1:[1]
Hey you can use first window function and achieve this.
val cust_data = Seq[(String, String, Int, String)](
("C0001", "apple", 20220224, null),
("C0001", "pear", 20220101, null),
("C0002", "strawberry", 20220224, null),
("C0001", "apple", 20220206, null)
).toDF("cust_id", "product", "date_id", "first_prod")
import org.apache.spark.sql.functions._
import org.apache.spark.sql.expressions._
val out_df = cust_data.withColumn("first_prod", first($"product").over(Window.partitionBy($"cust_id").orderBy($"date_id")))
out_df.show()
+-------+----------+--------+----------+
|cust_id| product| date_id|first_prod|
+-------+----------+--------+----------+
| C0001| pear|20220101| pear|
| C0001| apple|20220206| pear|
| C0001| apple|20220224| pear|
| C0002|strawberry|20220224|strawberry|
+-------+----------+--------+----------+
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Pradeep yadav |
