'How to expand months in pyspark

I have data as below

+-----+---------+----------+
| TYPE|DTIN_MNTH|DTOUT_MNTH|
+-----+---------+----------+
|    A|  2022-03|   2022-05|
|    B|  2022-04|   2022-04|
|    C|  2022-05|   2022-07|
+-----+---------+----------+

I want to expand the DTIN_MNTH and DTOUT_MNTH in rows as below,

+-----+---------+
| TYPE|     MNTH|
+-----+---------+
|    A|  2022-03|
|    A|  2022-04|
|    A|  2022-05|
|    B|  2022-04|
|    C|  2022-05|
|    C|  2022-06|
|    C|  2022-07|
+-----+---------+


Solution 1:[1]

From Spark 2.4, you can use sequence built-in function to generate array of months between two dates, then use explode to expand array of months. You can then reformat your expanded column to month format.

It gives you the following code, with input your input dataframe:

from pyspark.sql import functions as F

result = input.select(
  F.col('TYPE'),
  F.explode(
    F.sequence(
      F.to_timestamp(F.col('DTIN_MNTH'), 'yyyy-MM'),
      F.to_timestamp(F.col('DTOUT_MNTH'), 'yyyy-MM'),
      F.expr('INTERVAL 1 MONTH'))
  ).alias('MNTH')
).withColumn('MNTH', F.date_format(F.col('MNTH'), 'yyyy-MM'))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Vincent Doba