'How to append Psypark FOR loop output into single dataframe (spark.sql)

I have a pyspark for loop that uses a "customer" variable. I want to append the output of each loop iteration so that the final dataframe has all the rows output by the for loop. The code works except for the append portion. I have also tried using "union" but without success.

df = ""
df_output = []
customer=""

for customer in ['customer_abc', 'customer_xyz']:
  df = spark.sql(f"""
  SELECT sale, sum(amt) as total_sales
  FROM {customer}.salestable
  GROUP BY sale
  """).withColumn('Customer',lit(customer))
  df_output.append(df)
  
display(df_output)

Solution 1:^[1]

use df = df.union(df_output) instead of df_output.append(df)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Luiz Viola

'How to append Psypark FOR loop output into single dataframe (spark.sql)

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]