'DataFrame need to get permutations (pyspark)

so I'm farily new to the Azure Synapse/Notebooks env. I have a Dataframe with an array of lines and I want to create a new dataframe with the combinations between a specific prop of the struct inside. Not sure how to proceed. All examples I find use a simple array of numbers.

struct inside array

Thanks in advance to anyone who replies 😊



Solution 1:[1]

The below code may work for you to create the required schema:

from pyspark.sql.types import *
schema= StructType([StructField("lines",ArrayType(
    StructType(
        [
            StructField("courseID", StringType(),True),
            StructField("description", StringType(),True),
            StructField("displayRecordId", StringType(),True),
            StructField("menuId", StringType(),True),
            StructField("menuItemId", StringType(),True),
            StructField("minimumCustomerAge", LongType(),True),
            StructField("modifiers", StringType(),True),
            StructField("name", StringType(),True),
            StructField("portionTypeId", StringType(),True),
            StructField("price", DoubleType(),True)
        ]),True),True)
    ])

df=spark.createDataFrame(spark.sparkContext.parallelize(data),schema=schema)
df.printSchema()

The data in the above is a dummy data.

This is Output of above code for your Reference:

enter image description here

Please refer this link to learn more about the ArrayType in Pyspark.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 RakeshGovindula-MT