'EXPORT DATA OPTIONS() in BigQuery creates multiple files of a few kbs
Running a below command in BigQuery creates multiple files of a few Kbs, and there seems to be no control over those files, is there any way so that we don't get multiple files if an individual file size is too small ???
EXECUTE IMMEDIATE '''
EXPORT DATA
OPTIONS(
uri= 'gs://<bucket-name>/demo_dir/file-name-*.parquet.snappy',
format='PARQUET',
overwrite=true,
compression='SNAPPY')
AS SELECT * FROM `bigquery-public-data.bls.c_cpi_u`;
''';
Solution 1:[1]
Yes and no. You can force it into one file by removing your wildcard in the uri.
gs://<bucket-name>/demo_dir/file-name.parquet.snappy.
However if you are looking to specify a number of 1+n, you cannot. The recommendation is anything under a GB of data specify a single uri, and over a GB use the wild card. From there the wildcard will split up the data into as many files as it needs. More documentation can be found here: https://cloud.google.com/bigquery/docs/exporting-data#exporting_data_into_one_or_more_files
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Daniel Zagales |
