'PySpark: Adding elements from python list into spark.sql() statement
have list in python that is used throughout my code:
pylist = ['A', 'B', 'C', 'D']
I also have have a simple spark.sql() line that I need to execute:
query = spark.sql(
"""
SELECT col1, col2, col3
FROM database.table
WHERE col3 IN ('A', 'B', 'C', 'D')
"""
)
I want to replace the list of elements in the spark.sql() statment with the python list so that that last line in the SQL is
...
AND col3 IN pylist
I am aware of using {} and str.format but I am struggling to understand if that's the correct option here, and how that workd.
Solution 1:[1]
Convert to tuple to get "()" instead of "[]" while converting to string
sql_str="SELECT col1, col2, col3 FROM database.table WHERE col3 IN " + str(tuple(pylist))
query = spark.sql(sql_str)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | omuthu |
