'Replacewhere in pyspark
I have saved a dataframe as a delta table partitioned by [customer,site,machine,date] in overwrite mode and replacewhere by date>=value1 and date<=value2:
df.coalesce(1).write.mode('overwrite') \
.option("replaceWhere", date >= '2022-04-01' and date < '2022-04-02') \
.partitionBy([["customer", "site", "machine", "date"]]) \
.format('delta').save(output_filepath)
When I execute the statement twice (first run for customer1 and second run for customer2), then the customer1 data is getting overwritten by customer2 for 2022-04-01.
So I in repleacewhere clause I have added customer( '(date >= '2022-04-01' and date < '2022-04-02') and (customer.in(['customervalue']))')
I am getting error AnalysisException : Cannot recognize the predicate.
What are the other possible ways I can overwrite only for a particular customer and for particular date.
Thanks in adavance!!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
