'How to use df.withColumn() when second argument is string
Suppose we have a dataframe df and do the following:
df = df.withColumn('age2', df.age + 2)
We get a new dataframe. Suppose that df.age + 2 is being read in from a file (so it is a string). How do you convert this into a column expression without using eval?
Solution 1:[1]
If the text is a valid Spark SQL expression, e.g., age + 2, then you can simply use expr from pyspark.sql.functions to transform it into a column:
import pyspark.sql.functions as F
df = df.withColumn('age2', F.expr('age + 2'))
If the text is rather Python source code as in df.age + 2, then you don't have many alternatives that do not involve eval or a reimplementation of it.
If the text is neither a valid Spark SQL expression nor valid Python code, you need to write a parser for whatever grammar that text has and write code to transform expressions in that grammar to calls into the Spark API.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Hristo Iliev |
