'Is it possible to download just part of a ZIP archive (e.g. one file)? [closed]

Is there a way by which I can download only a part of a .rar or .zip file without downloading the whole file?

There is a ZIP file containing files A, B, C, and D. I only need A. Can I somehow tweak the download to download only A or if possible extract the file in the server itself and get A only?



Solution 1:[1]

The trick is to do what Sergio suggests without doing it manually. This is easy if you mount the ZIP file via an HTTP-backed virtual filesystem and then use the standard unzip command on it. This way the unzip utility's I/O calls are translated to HTTP range GETs, which means only the chunks of the ZIP file that you want get transferred over the network.

Here's an example for Linux using HTTPFS, a very lightweight virtual filesystem (it uses FUSE). There are similar tools for Windows.

Get/build httpfs:

$ wget http://sourceforge.net/projects/httpfs/files/httpfs/1.06.07.02
$ mv 1.06.07.10 httpfs_1.06.07.10.tar.bz2
$ tar -xjf httpfs_1.06.07.10.tar.bz2
$ rm httpfs
$ ./make_httpfs

Mount a remote ZIP file and extract one file from it:

$ mkdir mount_pt
$ sudo ./httpfs http://server.com/zipfile.zip mount_pt
$ sudo ls mount_pt
zipfile.zip
$ sudo unzip -p mount_pt/zipfile.zip the_file_I_want.txt > the_file_I_want.txt
$ sudo umount mount_pt

Of course you can also use whatever other tools beside the command-line one (I need sudo because it seems FUSE is set up that way on my machine, you shouldn't have to need it).

Solution 2:[2]

In a way, yes, you can.

ZIP file format says that there's a "central directory". Basically, this is a table that stores what files are in the archive and what offsets do they have.

So, using Content-Range you could download part of the file from the end (the central directory is the last thing in a ZIP file) and try to identify the central directory in it. If you succeed then you know the file list and offsets, so you can proceed and get those chunks separately and decompress them yourself.

This approach is quite error-prone and is not guaranteed to work. But so is hacking in general :-)

Another possible approach would be to build a custom server for that (see pst's answer for more details).

Solution 3:[3]

There are several ways for a normal person to be able to download an individual file from a compressed ZIP file, unfortunately they aren't common knowledge. There are some open-source tools and online web services, including:

Solution 4:[4]

You can arrange for your file to appear in the back of the ZIP file.

Download 100k:

$ curl -r -100000 https://www.keepassx.org/releases/2.0.2/KeePassX-2.0.2.zip -o tail.zip
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                             Dload  Upload   Total   Spent    Left  Speed
100   97k  100   97k    0     0  84739      0  0:00:01  0:00:01 --:--:-- 84817

Check what files we did get:

$ unzip -t tail.zip
  (please check that you have transferred or created the zipfile in the
  appropriate BINARY mode and that you have compiled UnZip properly)
error [tail.zip]:  attempt to seek before beginning of zipfile
  (please check that you have transferred or created the zipfile in the
  appropriate BINARY mode and that you have compiled UnZip properly)
error [tail.zip]:  attempt to seek before beginning of zipfile
  (please check that you have transferred or created the zipfile in the
  appropriate BINARY mode and that you have compiled UnZip properly)
error [tail.zip]:  attempt to seek before beginning of zipfile
  (please check that you have transferred or created the zipfile in the
  appropriate BINARY mode and that you have compiled UnZip properly)
error [tail.zip]:  attempt to seek before beginning of zipfile
  (please check that you have transferred or created the zipfile in the
  appropriate BINARY mode and that you have compiled UnZip properly)
    testing: KeePassX-2.0.2/share/translations/keepassx_uk.qm   OK
    testing: KeePassX-2.0.2/share/translations/keepassx_zh_CN.qm   OK
    testing: KeePassX-2.0.2/share/translations/keepassx_zh_TW.qm   OK
    testing: KeePassX-2.0.2/zlib1.dll   OK
At least one error was detected in tail.zip.

Then extract the last file:

$ unzip tail.zip KeePassX-2.0.2/zlib1.dll
Archive:  tail.zip
error [tail.zip]:  missing 7751495 bytes in zipfile
  (attempting to process anyway)
  inflating: KeePassX-2.0.2/zlib1.dll

Solution 5:[5]

I think Sergio Tulentsev's idea is brilliant.

However, if there is control over the server -- e.g., custom code can be deployed -- then it is a rather trivial operation (in the scheme of things :) to map/handle a request, extract the relevant portion of the ZIP archive, and send the data back in the HTTP stream.

The request might look like:

http://foo.bar/myfile.zip_a.jpeg

Which would mean extract -- and return -- "a.jpeg" from "myfile.zip".

(I intentionally chose this silly format so that browsers would likely choose "myfile.zip_a.jpeg" as the name in the download dialog when it appears.)

Of course, how this is implemented depends on the server/language/framework and there may already be existing solutions that support a similar operation (but I know not).

Solution 6:[6]

Based on the good input I have written a code-snippet in Powershell to show how it could work:

# demo code downloading a single DLL file from an online ZIP archive
# and extracting the DLL into memory to mount it finally to the main process.

cls
Remove-Variable * -ea 0

# definition for the ZIP archive, the file to be extracted and the checksum:
$url = 'https://github.com/sshnet/SSH.NET/releases/download/2020.0.1/SSH.NET-2020.0.1-bin.zip'
$sub = 'net40/Renci.SshNet.dll'
$md5 = '5B1AF51340F333CD8A49376B13AFCF9C'

# prepare HTTP client:
Add-Type -AssemblyName System.Net.Http
$handler = [System.Net.Http.HttpClientHandler]::new()
$client  = [System.Net.Http.HttpClient]::new($handler)

# get the length of the ZIP archive:
$req = [System.Net.HttpWebRequest]::Create($url)
$req.Method = 'HEAD'
$length = $req.GetResponse().ContentLength
$zip = [byte[]]::new($length)

# get the last 10k:
# how to get the correct length of the central ZIP directory here?
$start = $length-10kb
$end   = $length-1
$client.DefaultRequestHeaders.Add('Range', "bytes=$start-$end")
$result = $client.GetAsync($url).Result
$last10kb = $result.content.ReadAsByteArrayAsync().Result
$last10kb.CopyTo($zip, $start)

# get the block containing the DLL file:
# how to get the exact file-offset from the ZIP directory?
$start = $length-3537kb
$end   = $length-3201kb
$client.DefaultRequestHeaders.Clear()
$client.DefaultRequestHeaders.Add('Range', "bytes=$start-$end")
$result = $client.GetAsync($url).Result
$block = $result.content.ReadAsByteArrayAsync().Result
$block.CopyTo($zip, $start)

# extract the DLL file from archive:
Add-Type -AssemblyName System.IO.Compression
$stream = [System.IO.Memorystream]::new()
$stream.Write($zip,0,$zip.Length)
$archive = [System.IO.Compression.ZipArchive]::new($stream)
$entry = $archive.GetEntry($sub)
$bytes = [byte[]]::new($entry.Length)
[void]$entry.Open().Read($bytes, 0, $bytes.Length)

# check MD5:
$prov = [Security.Cryptography.MD5CryptoServiceProvider]::new().ComputeHash($bytes)
$hash = [string]::Concat($prov.foreach{$_.ToString("x2")})
if ($hash -ne $md5) {write-host 'dll has wrong checksum.' -f y ;break}

# load the DLL:
[void][System.Reflection.Assembly]::Load($bytes)

# use the single demo-call from the DLL:
$test = [Renci.SshNet.NoneAuthenticationMethod]::new('test')
'done.'

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Louis
Solution 2 Peter Mortensen
Solution 3
Solution 4 Peter Mortensen
Solution 5 Sergio Tulentsev
Solution 6