'DataFrame need to get permutations (pyspark)
so I'm farily new to the Azure Synapse/Notebooks env. I have a Dataframe with an array of lines and I want to create a new dataframe with the combinations between a specific prop of the struct inside. Not sure how to proceed. All examples I find use a simple array of numbers.
Thanks in advance to anyone who replies 😊
Solution 1:[1]
The below code may work for you to create the required schema:
from pyspark.sql.types import *
schema= StructType([StructField("lines",ArrayType(
StructType(
[
StructField("courseID", StringType(),True),
StructField("description", StringType(),True),
StructField("displayRecordId", StringType(),True),
StructField("menuId", StringType(),True),
StructField("menuItemId", StringType(),True),
StructField("minimumCustomerAge", LongType(),True),
StructField("modifiers", StringType(),True),
StructField("name", StringType(),True),
StructField("portionTypeId", StringType(),True),
StructField("price", DoubleType(),True)
]),True),True)
])
df=spark.createDataFrame(spark.sparkContext.parallelize(data),schema=schema)
df.printSchema()
The data in the above is a dummy data.
This is Output of above code for your Reference:
Please refer this link to learn more about the ArrayType in Pyspark.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | RakeshGovindula-MT |
