'Multiple process curl command for urls to output to individual files

I am attempting to curl multiple urls in a bash command. Eventually I will be curling a large number of Urls so I am using xargs to use multiple processes to speed up the process. My file consists of x number of URLs:

https://someurl.com
https://someotherurl.com

My issue comes when attempting to output the results to separate files named after the URLs I curl. The bash command I have is:

xargs -P 5 -n 1 -I% curl -k -L % -0 % < urls.txt

When I run this I get 'Failed to create file https://someotherurl.com'



Solution 1:[1]

You cannot create a file with / in the filename. You could do it this way:

#!/bin/bash

while IFS= read -r line
do
    echo "LINE: $line"

    if [[ "$line" != "" ]]
    then
        filename="${line#https://}"
        echo "FILENAME: $filename"
        
        # YOUR CURL COMMAND HERE, USING $filename
    fi
done < url.txt
  • it ignores empty lines
  • variable substitution is used to remove the https:// part of each URL
  • this will allow you to create the file

Note: if your URLs containt sub-directories, they must be removed as well.

Ex: you want to do https://www.exemple.com/some/sub/dir The script I suggested here would try to create a file named "www.exemple.com/some/sub/dir". In this case, you could replace the / with _ using tr.

The script would become:

#!/bin/bash

while IFS= read -r line
do
    echo "LINE: $line"

    if [[ "$line" != "" ]]
    then
        filename=$(echo "$line" | tr '/' '_')
        filename2=${filename#https:__}
        echo "FILENAME: $filename2"

        # YOUR CURL COMMAND HERE, USING $filename2
    fi
done < url.txt

Solution 2:[2]

Because your question is ambiguous, I would assume:

  1. You have a file urls.txt that contains URLs separated by LF.
  2. You want to download all URLs by curl and use each URL as its filename.

Unfortunately, that's not possible because URL contains invalid characters like slash /. Alternatively, for this case, I would suggest you use Bsse64 safe mode to decode URL before saving to file based on RFC 3548.

After applying this requirement, your script would become like:

seq 100 | xargs -I@ echo 'https://example.com?@' > urls.txt
xargs -P0 -L1 sh -c 'curl -SskL0 -o $(printf %s "$1" | uuencode -m /dev/stdout | sed "1d;\$d" | tr +/ -_) "$1"' sh < urls.txt

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Nic3500
Solution 2 Weihang Jian