'Adding 10+ headers to a Pyspark Dataframe
I have a csv file that does not have headers, and it consists of 49 columns. I was given a separate csv file with columns' description and column name. Instead of adding StructField 49 times (like StructField("srcip",StringType(),True)), is there another way to do it? Like a function?
Thank you.
Solution 1:[1]
Assuming you have a list of column names (by reading from csv etc), you can loop through it and create a proper schema
cols = ['a', 'b', 'c']
schema = T.StructType([T.StructField(c, T.StringType()) for c in cols])
# StructType(List(StructField(a,StringType,true),StructField(b,StringType,true),StructField(c,StringType,true)))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | pltc |
