'Speed up python requests download speed (by behaving appropriately around throttling)

How can I download a file fast using python?

I tried different modules like wget and they all take about the same time to execute. In this example I will get a file from reddit

https://v.redd.it/rfxd2e2zhet81/DASH_1080.mp4?source=fallback

    video_url="https://v.redd.it/rfxd2e2zhet81/DASH_1080.mp4?source=fallback"
    start = datetime.datetime.now()
    print(start)
    response = requests.get(video_url)
    stop = datetime.datetime.now()
    print(stop)
    print("status: " + str(response.status_code))

output:

2022-04-14 15:59:52.258759
2022-04-14 16:02:03.791324
status: 200

Using Firefox the same request completes in seemingly less than a second.

browser download

A right click and "save video as" is not distinguishable from instant.

My understanding from researching similar questions on stack overflow is that the following minimal example should result in OK download times and only depend on my internet connection. https://www.speedtest.net/ configured for a single connection gives me the following result:

connection speed

The file is about 20 MB in size and really should not take long to download.

As a control, this call finishes fast.

    video_url="https://stackoverflow.com/questions/71872663/speed-up-python-requests-download-speed"
    start = datetime.datetime.now()
    print(start)
    response = requests.get(video_url)
    stop = datetime.datetime.now()
    print(stop)
    print("status: " + str(response.status_code))

output:

2022-04-14 15:58:47.022299
2022-04-14 15:58:47.418743
status: 200

I ran the same request against a 40 MB file hosted on my own blob storage:

2022-04-14 16:07:59.304382
2022-04-14 16:08:00.729495
status: 200

Based on the speed differences using firefox, python and python on other targets it looks like Python is beeing throttled.

How can I use a python script and behave accordingly as to avoid being throttled?

I tried using the headers that firefox was using in its first request to no avail - the outcome was the same.



Solution 1:[1]

It looks like the solution is to get around the python eco system. I tested the solution that user @Daweo suggested in the comments.

It requires an aria2 installation.

video_url="https://v.redd.it/rfxd2e2zhet81/DASH_1080.mp4?source=fallback"
start = datetime.datetime.now()
print(start)
system("aria2c " + video_url)
stop = datetime.datetime.now()
print(stop)

the output is:

2022-04-15 21:38:09.693262

04/15 21:38:09 [NOTICE] Downloading 1 item(s)

04/15 21:38:10 [NOTICE] Download complete: /.../DASH_1080.mp4

Download Results:
gid   |stat|avg speed  |path/URI
======+====+===========+=======================================================
c2e8ce|OK  |    59MiB/s|/.../DASH_1080.mp4

Status Legend:
(OK):download completed.
2022-04-15 21:38:10.131280

So that took something like 400 ms.

Solution 2:[2]

Using Firefox the same request completes in seemingly less than a second. A right click and "save video as" is not distinguishable from instant.

Observe that you got responses with code 206. 206 Partial means that requests were sent, each presumably for different part of file. After download finish parts are welded to recreate file. This might allow shorter download time, if every part is served with similar speed as when there is single download. Such behavior might be emulated using requests by sending request with appriopate headers (see linked description of 206 Partial) and using for example multiprocessing, but before that I must warn you that not all servers support partial gimmick and you should carefully calculate if additional burden of creating code for doing so is worth gain you can achieve.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Johannes
Solution 2 Daweo