'Reject a download below a minimum size in a bash script
I am downloading files using wget (curl would work too) like so,
wget somesite.com/files/{1..1000}.txt
I only want to download the files that are larger than a minimum size. File size is the only criteria I can use to determine whether I want the file, the file names are not descriptive, and all have the same extension.
As I understand it, when the request is made to the server, it returns the size of the file before the download starts, so it should be possible to reject the file without needing to download it.
Is there a flag for wget or curl that can do this, or script that add this functionality? I found two similar questions, here and here for curl & wget respectively, but neither had an answer that met these requirements. I am looking to avoid downloading the file and then rejecting it afterwards.
Alternativley, is there another terminal-based tool I can use that can do this?
Solution 1:[1]
Alternatively, is there another terminal-based tool I can use that can do this?
Yes, you could take a look at xidel:
$ xidel -s --method=HEAD https://www.somesite.com/files/{1..1000}.txt \
-f '$url[substring-after($headers,"Content-Length: ") gt 51200]' \
--download .
--method=HEADprevents the entire content of these text-files from being read.-f"follows" / opens the content of urls. In this case only those$urls that, for instance, are larger than 50KB (51200 bytes).--downloaddownloads those text-files to the current dir.
Alternatively you can do everything with an extraction-query:
$ xidel -se '
for $x in (1 to 1000) ! x"https://www.somesite.com/files/{.}.txt"
where substring-after(
x:request({"method":"HEAD","url":$x})/headers,
"Content-Length: "
) gt 51200
return
x:request({"url":$x})/file:write-binary(
extract(url,".+/(.+)",1),
string-to-base64Binary(raw)
)
'
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Reino |
