'com.databricks.sql.io.FileReadException Caused by: com.microsoft.azure.storage.StorageException: Blob hash mismatch

I am getting the following error:

com.databricks.sql.io.FileReadException: Error while reading file wasbs:[email protected]/cook/processYear=2021/processMonth=12/processDay=30/processHour=18/part-00003-tid-4178615623264760328.c000.avro.
Caused by: com.microsoft.azure.storage.StorageException: Blob hash mismatch (integrity check failed), Expected value is 8P7bo1mnLPoLxVw==, retrieved bu+CiCkLm/kc6QA==.

where processYear, processMonth, processDay and processHour are partition columns.

however, this is actually just a WARN, and the code still proceeds to execute(also I am able to read this file separately in notebook)... but eventually the job dies due to:

WARN Lost task 9026.0 in stage 324.0 (TID 1525596, 10.139.64.16, executor 83): TaskKilled (Stage cancelled)

I am using the following databricks and spark configs:

RuntimeVersion: 5.5.x-scala2.11
MasterConfiguration:
    NodeType: Standard_D32s_v3
    NumberOfNodes: 1
WorkerConfiguration:
    NodeType: Standard_D32s_v3
    NumberOfNodes: 2

This same job is deployed in several other environments too with much more data volume, and it does not fail there. Any idea why it may fail here?

Thanks!



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source