'Is it possible to read a specific file directly from a zip file that is stored on S3?

I have a file called story.txt in a zip file called big.zip that is stored in an S3 bucket called zips-bucket.

I want my Python code to read the content of just story.txt without downloading or even scanning the entire big zip file. Is it possible? How?



Solution 1:[1]

Yes, this is possible. You will need to import the smart-open and zipfile modules. Say your compressed file is in s3://zips-bucket/big.zip. Do the following:

import smart_open as so
import zipfile

with so.open('s3://zips-bucket/big.zip', 'rb') as file_data
  with zipfile.ZipFile(file_data) as z:
    with z.open('story.txt') as zip_file_data:
      story_lines = zip_file_data.readlines()

And that should do it!

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Dr. Arun