'Access global variable from UDF (User Defined Function) in python in spark
I am trying to alter a global variable from inside a pyspark.sql.functions.udf function in python. But, the change in not getting reflected in the global variable.
The reproducible example along with outputs is:
counter = 0
schema2 = StructType([\
StructField("id", IntegerType(), True),
StructField("name", StringType(), True)
])
data2 = [(1, "A"), (2, "B")]
df = spark.createDataFrame(data = data2, schema = schema2)
def myFunc(column):
global counter
counter = counter + 1
return column + 5
myFuncUDF = udf(myFunc, IntegerType())
display(df.withColumn('id1', myFuncUDF(df.id)))
Output:
| id | name | id1 |
|---|---|---|
| 1 | A | 6 |
| 2 | B | 7 |
When I print the counter variable, it remains 0.
Can anyone help me to know how to access a global variable inside a UDF and alter the global variable on each call to the UDF? or whether it is not possible?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
