'Replacewhere in pyspark

I have saved a dataframe as a delta table partitioned by [customer,site,machine,date] in overwrite mode and replacewhere by date>=value1 and date<=value2:

df.coalesce(1).write.mode('overwrite') \
  .option("replaceWhere", date >= '2022-04-01' and date < '2022-04-02') \
  .partitionBy([["customer", "site", "machine", "date"]]) \
  .format('delta').save(output_filepath)

When I execute the statement twice (first run for customer1 and second run for customer2), then the customer1 data is getting overwritten by customer2 for 2022-04-01.

So I in repleacewhere clause I have added customer( '(date >= '2022-04-01' and date < '2022-04-02') and (customer.in(['customervalue']))') I am getting error AnalysisException : Cannot recognize the predicate. What are the other possible ways I can overwrite only for a particular customer and for particular date.

Thanks in adavance!!



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source