'Reading first few lines from files in google cloud storage
While processing huge files ~100GB file size, sometime we need to check first/last few lines (header and trailer lines).
The easy option is to download entire file locally using
gsutil cp gs://bucket_name/file_name .
and then use head/tail command to check header/trailer lines which is not feasible as it will be time consuming and associated cost of extracting data from cloud.
It is same as performing -
gsutil cat gs://bucket_name/file_name | head -1
The other option is to create external table in GCP Tables OR visualize them in datastudio OR read from dataproc cluster/VM.
Is there any other quick option just to check header/trailer lines from cloud storage ?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
