'Add rows of data to each group in a Spark dataframe

I have this dataframe -

data = [(0,1,1,201505,3),
        (1,1,1,201506,5),
        (2,1,1,201507,7),
        (3,1,1,201508,2),
        (4,2,2,201750,3),
        (5,2,2,201751,0),
        (6,2,2,201752,1),
        (7,2,2,201753,1)
       ]
cols = ['id','item','store','week','sales']
data_df = spark.createDataFrame(data=data,schema=cols)
display(data_df)

What I want it this -

data_new = [(0,1,1,201505,3,0),
            (1,1,1,201506,5,0),
            (2,1,1,201507,7,0),
            (3,1,1,201508,2,0),
            (4,1,1,201509,0,0),
            (5,1,1,201510,0,0),
            (6,1,1,201511,0,0),
            (7,1,1,201512,0,0),
            (8,2,2,201750,3,0),
            (9,2,2,201751,0,0),
            (10,2,2,201752,1,0),
            (11,2,2,201753,1,0),
            (12,2,2,201801,0,0),
            (13,2,2,201802,0,0),
            (14,2,2,201803,0,0),
            (15,2,2,201804,0,0)]
cols_new = ['id','item','store','week','sales','flag',]
data_df_new = spark.createDataFrame(data=data_new,schema=cols_new)
display(data_df_new)

So basically, I want 8 (this can also be 6 or 10) weeks of data for each item-store groupby combination. Wherever the 52/53 weeks for the year ends, I need the weeks for the next year, as I have mentioned in the sample. I need this in PySpark, thanks in advance!

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Add rows of data to each group in a Spark dataframe

Sources

Related Questions