'Reading zip files without full download

Is it possible to read the contents of a .ZIP file without fully downloading it?

I'm building a crawler and I'd rather not have to download every zip file just to index their contents.

Thanks;



Solution 1:[1]

the format suggests that the key piece of information about what's in the file resides at the end of it. Entries are then specified as an offset from that particular entry, so you'll need to have access to the whole thing I believe.

GZip formats are able to be read as a stream I believe.

Solution 2:[2]

I don't know if this helps, as I'm not a programmer. But in Outlook you can preview zip files and see the actual content, not just the file directory (if they are previewable documents like a pdf).

Solution 3:[3]

There is a solution implemented in ArchView "ArchView can open archive file online without downloading the whole archive." https://addons.mozilla.org/en-US/firefox/addon/5028/

Inside the archview-0.7.1.xpi in the file "archview.js" you can look at their javascript approach.

Solution 4:[4]

It's possible. All you need is server that allows to read bytes in ranges, fetch end recored (to know size of CD), fetch central directory (to know where file starts and ends) and then fetch proper bytes and handle them.

Here is implementation in pyhon: onlinezip

[full disclosure: I'm author of library]

enter image description here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Anon
Solution 2 Joe Raby
Solution 3 André Ricardo
Solution 4 Mr Jedi