'SparkException: Job aborted
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in stage 76.0 failed 4 times, most recent failure: Lost task 5.3 in stage 76.0 (TID 2334) (10.139.64.5 executor 6): com.databricks.sql.io.FileReadException: Error while reading file <File_Path> It is possible the underlying files have been updated. You can explicitly invalidate the cache in Spark by running 'REFRESH TABLE tableName' command in SQL or by recreating the Dataset/DataFrame involved. If Delta cache is stale or the underlying files have been removed, you can invalidate Delta cache manually by restarting the cluster.
Solution 1:[1]
Additionally to what the answer by AbhishekKhandave-MT suggests what you can try is explicitly repairing the table:
FSCK REPAIR TABLE delta.`path/to/delta`
This also fixes scenarios where the underlying files of the table have actually been changed without it being reflected in the "_delta_log" transaction log.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | restlessmodem |
