'Why does urllib3 fail to download list of files if authentication is required <and> if headers aren't re-created

*NOTE: I'm posting this for two reasons:

  1. It took me maybe 3-4hrs to figure out a solution (this is my first urllib3 project) and hopefully this will help others who run into this.
  2. I'm curious why urllib3 behaves as described below, as it is (to me anyway) very un-intuitive.*

I'm using urllib3 to first load a list of files and then to download the files that are on the list. The server the files are on requires authentication.

The behavior I ran into is that if I don't re-make the headers before adding each file to the PoolManager, only the first file downloads correctly. The contents of all subsequent files is an error message from the server saying that authentication failed.

However, if I add a line that regenerates the headers (see the commented line in the code snippet below) the download works as expected. Is this intended behavior and if so can anyone explain why the headers can't be re-used (all they contain is my username/password, which doesn't change).

http = urllib3.PoolManager(num_pools=10,maxsize = 10, block=True)
myHeaders = urllib3.util.make_headers(basic_auth=f'{username}:{password}')
files = http.request('GET', url, headers=myHeaders)
file_list = files.data.decode('utf-8')
file_list = file_list.split('<a href="')
file_list_a = [file.split('">')[0] for file in file_list if file.startswith('https://')]

for path in tqdm.tqdm(file_list,desc = 'Downloading'):
    output_fn = get_output_filename(path,output_dir)
    
    #___---^^^Re-make headers^^^---___
    myHeaders = urllib3.util.make_headers(basic_auth=f'{username}:{password}')
    
    with open(output_fn, 'wb') as out:
        r = http.request('GET', path, headers=myHeaders, preload_content=False)
        shutil.copyfileobj(r, out)

Thanks in advance,



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source