'com.databricks.sql.io.FileReadException Caused by: com.microsoft.azure.storage.StorageException: Blob hash mismatch

I am getting the following error:

com.databricks.sql.io.FileReadException: Error while reading file wasbs:[email protected]/cook/processYear=2021/processMonth=12/processDay=30/processHour=18/part-00003-tid-4178615623264760328.c000.avro.
Caused by: com.microsoft.azure.storage.StorageException: Blob hash mismatch (integrity check failed), Expected value is 8P7bo1mnLPoLxVw==, retrieved bu+CiCkLm/kc6QA==.

where processYear, processMonth, processDay and processHour are partition columns.

however, this is actually just a WARN, and the code still proceeds to execute(also I am able to read this file separately in notebook)... but eventually the job dies due to:

WARN Lost task 9026.0 in stage 324.0 (TID 1525596, 10.139.64.16, executor 83): TaskKilled (Stage cancelled)

I am using the following databricks and spark configs:

RuntimeVersion: 5.5.x-scala2.11
MasterConfiguration:
    NodeType: Standard_D32s_v3
    NumberOfNodes: 1
WorkerConfiguration:
    NodeType: Standard_D32s_v3
    NumberOfNodes: 2

This same job is deployed in several other environments too with much more data volume, and it does not fail there. Any idea why it may fail here?

Thanks!

apache-spark databricks

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'com.databricks.sql.io.FileReadException Caused by: com.microsoft.azure.storage.StorageException: Blob hash mismatch

Sources

Related Questions