'Compression without dictionary
I have been testing the various compression algorithms with parquet files, and have settled on Zstd.
Now as far as I understand Zstd uses adaptive dictionary unless one is explicitly specified, thus it begins with an empty one. However when having a dictionary enabled the compressed size and and the execution time are quite unsatisfactory.
The file size without using a dictionary is quite less compared to using the adaptive one. (The number at the end of the name is the compression level):
- Name: C:\ParquetFiles\Zstd1 Execution time: 279 ms Size: 13738134
- Name: C:\ParquetFiles\Zstd2 Execution time: 140 ms Size: 13207017
- Name: C:\ParquetFiles\Zstd9 Execution time: 511 ms Size: 12701030
And for comparison the log from using the adaptive dictionary:
- Name: C:\ParquetFiles\ZstdDictZstd1 Execution time: 487 ms Size: 19462825
- Name: C:\ParquetFiles\ZstdDictZstd2 Execution time: 402 ms Size: 19292513
- Name: C:\ParquetFiles\ZstdDictZstd9 Execution time: 614 ms Size: 19072779
Can you help me understand the significance of this, shouldn't the output with an empty dictionary perform at least as good as Zstd compression with dictionary disabled?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|

