'Reading first few lines from files in google cloud storage

While processing huge files ~100GB file size, sometime we need to check first/last few lines (header and trailer lines).

The easy option is to download entire file locally using

gsutil cp gs://bucket_name/file_name .

and then use head/tail command to check header/trailer lines which is not feasible as it will be time consuming and associated cost of extracting data from cloud.

It is same as performing -

gsutil cat gs://bucket_name/file_name | head -1

The other option is to create external table in GCP Tables OR visualize them in datastudio OR read from dataproc cluster/VM.

Is there any other quick option just to check header/trailer lines from cloud storage ?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source