'Why Glue partition won't be updated and causing spectrum Scan Error?
We have implemented delta lake but one issue as below: One table can be created and ingested, but after new data has been ingested, we will spectrum scan error:
SQL Error [XX000]: ERROR: Spectrum Scan Error: DeltaManifest Detail:
error: Spectrum Scan Error: DeltaManifest code: 15005 context: Error fetching Delta Lake manifest [tablenamexxx]/target/_symlink_format_manifest/active_ind=Y/creation_time=2022-05-10/manifest Message: S3ServiceException:The specified key does not exist.,Status 404,Error NoSuchKey,Rid EFHMMKBZ1EG5ZRJV,ExtRid p query: 4100335 location: scan_range_manager.cpp:1182 process: worker_thread [pid=9305]
By further checking, we found it was caused 2 partitions setup for this table. Once the new data ingested, it will expiry the old data, then move all data [creation_time=2022-05-10] to inactive_ind partition. _symlink_format_manifest has been updated corrrectly, no creation_time=2022-05-10 & active_ind='Y', but partations in Glue Catalog still keep that. That's causing the error.
If we manually run below script to drop the partition, it back to normal. Wondering why and how to resolve it? ALTER TABLE XXX DROP PARTITION (creation_time = '2022-05-10', active_ind = 'Y');
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
