'Adding a nullable column in Spark dataframe
In Spark, literal columns, when added, are not nullable:
from pyspark.sql import SparkSession, functions as F
spark = SparkSession.builder.getOrCreate()
df = spark.createDataFrame([(1,)], ['c1'])
df = df.withColumn('c2', F.lit('a'))
df.printSchema()
# root
# |-- c1: long (nullable = true)
# |-- c2: string (nullable = false)
How to create a nullable column?
Solution 1:[1]
The shortest method I've found - using when (the otherwise clause seems not needed):
df = df.withColumn('c2', F.when(F.lit(True), F.lit('a')))
If in Scala: .withColumn("c2", when(lit(true), lit("a")))
Full test result:
from pyspark.sql import SparkSession, functions as F
spark = SparkSession.builder.getOrCreate()
df = spark.createDataFrame([(1,)], ['c1'])
df = df.withColumn('c2', F.when(F.lit(True), F.lit('a')))
df.show()
# +---+---+
# | c1| c2|
# +---+---+
# | 1| a|
# +---+---+
df.printSchema()
# root
# |-- c1: long (nullable = true)
# |-- c2: string (nullable = true)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
