'PySpark: Adding elements from python list into spark.sql() statement

have list in python that is used throughout my code:

pylist = ['A', 'B', 'C', 'D']

I also have have a simple spark.sql() line that I need to execute:

query = spark.sql(
    """
    SELECT col1, col2, col3
    FROM database.table
    WHERE col3 IN ('A', 'B', 'C', 'D')
    """
)

I want to replace the list of elements in the spark.sql() statment with the python list so that that last line in the SQL is

...
AND col3 IN pylist

I am aware of using {} and str.format but I am struggling to understand if that's the correct option here, and how that workd.



Solution 1:[1]

Convert to tuple to get "()" instead of "[]" while converting to string

sql_str="SELECT col1, col2, col3 FROM database.table WHERE col3 IN " + str(tuple(pylist))

query = spark.sql(sql_str)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 omuthu