'requests only partially obtains status codes for a list of URLs
The script below gets status codes for a list of URLs found in the variable link, then checks for the status codes before appending them to respective URLs.
Unfortunately, I've been facing an issue where only some URLs have status codes appended to them, while others don't.
I'm not sure if the problem is with the requests library obtaining the status codes, or with the process of appending. Either way, something goes awry.
I tried removing session and replacing it with requests. I also tried replacing the user agent with another one, and removing it entirely. Nothing worked.
I was able to replicate the output using an online Python compiler, so it goes to show that there's something possibly wrong with the code or the requests library.
Additionally, I tried a different regex (http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*(),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+, one that catches all URLs) and with a different list, and I got similar results: some with status codes, others without.
Script
import requests, re
from requests import Session
session = Session()
link = "https://web.archive.org/cdx/search/cdx?url=twitter.com/tiredarabwoman/status&matchType=prefix&filter=statuscode:200&from=20220207&to=20220209"
banana = []
y = session.get(link).text
urls = re.findall(r'https?://(?:www\.)?(?:mobile\.)?twitter\.com/(?:#!/)?\w+/status(?:es)?/\d+', y)
for url in urls:
banana.append(f"{url}")
apple = list(set(banana))
results = []
headers = {'user-agent':'Mozilla/5.0 (compatible; DuckDuckBot-Https/1.1; https://duckduckgo.com/duckduckbot)'}
for url in apple:
response = session.get(url, headers=headers)
status_code = response.status_code
results.append((url, status_code))
for url, status_code in results:
apple.append(f"{url} {status_code}")
print(apple)
Output
['https://twitter.com/tiredarabwoman/status/1490817344339329026', 'https://twitter.com/tiredarabwoman/status/1490792906596175872', 'https://twitter.com/tiredarabwoman/status/1491208045388775426', 'https://twitter.com/tiredarabwoman/status/1491465751697502213', 'https://twitter.com/tiredarabwoman/status/1491545382832250880', 'https://twitter.com/tiredarabwoman/status/1491545181165920268', 'https://twitter.com/tiredarabwoman/status/1490855316661149697', 'https://twitter.com/tiredarabwoman/status/1491546028239110145', 'https://twitter.com/tiredarabwoman/status/1491464782150615042', 'https://twitter.com/tiredarabwoman/status/1491547108331118597', 'https://twitter.com/tiredarabwoman/status/1490597347629404164', 'https://twitter.com/tiredarabwoman/status/1491545016052981763', 'https://twitter.com/tiredarabwoman/status/1491180990127276033', 'https://twitter.com/tiredarabwoman/status/1490833920669061124', 'https://twitter.com/tiredarabwoman/status/1490823383780630529', 'https://twitter.com/tiredarabwoman/status/1491545630333976579', 'https://twitter.com/tiredarabwoman/status/1490785997243793412', 'https://twitter.com/tiredarabwoman/status/1490824991608557570', 'https://twitter.com/tiredarabwoman/status/1490854364831174657', 'https://twitter.com/tiredarabwoman/status/1491455740984573954', 'https://twitter.com/tiredarabwoman/status/1490825179076825089', 'https://twitter.com/tiredarabwoman/status/1490855625034985473', 'https://twitter.com/tiredarabwoman/status/1490821052183846915', 'https://twitter.com/tiredarabwoman/status/1491515906199068678', 'https://twitter.com/tiredarabwoman/status/1491193051779563522', 'https://twitter.com/tiredarabwoman/status/1491498731585343491', 'https://twitter.com/tiredarabwoman/status/1490642306759864325', 'https://twitter.com/tiredarabwoman/status/1491454956984881156', 'https://twitter.com/tiredarabwoman/status/1491455552282853376', 'https://twitter.com/tiredarabwoman/status/1490817344339329026 200', 'https://twitter.com/tiredarabwoman/status/1490792906596175872 200', 'https://twitter.com/tiredarabwoman/status/1491208045388775426 200', 'https://twitter.com/tiredarabwoman/status/1491465751697502213 200', 'https://twitter.com/tiredarabwoman/status/1491545382832250880 200', 'https://twitter.com/tiredarabwoman/status/1491545181165920268 200', 'https://twitter.com/tiredarabwoman/status/1490855316661149697 200', 'https://twitter.com/tiredarabwoman/status/1491546028239110145 200', 'https://twitter.com/tiredarabwoman/status/1491464782150615042 200', 'https://twitter.com/tiredarabwoman/status/1491547108331118597 200', 'https://twitter.com/tiredarabwoman/status/1490597347629404164 200', 'https://twitter.com/tiredarabwoman/status/1491545016052981763 200', 'https://twitter.com/tiredarabwoman/status/1491180990127276033 200', 'https://twitter.com/tiredarabwoman/status/1490833920669061124 200', 'https://twitter.com/tiredarabwoman/status/1490823383780630529 200', 'https://twitter.com/tiredarabwoman/status/1491545630333976579 200', 'https://twitter.com/tiredarabwoman/status/1490785997243793412 200', 'https://twitter.com/tiredarabwoman/status/1490824991608557570 200', 'https://twitter.com/tiredarabwoman/status/1490854364831174657 200', 'https://twitter.com/tiredarabwoman/status/1491455740984573954 200', 'https://twitter.com/tiredarabwoman/status/1490825179076825089 200', 'https://twitter.com/tiredarabwoman/status/1490855625034985473 200', 'https://twitter.com/tiredarabwoman/status/1490821052183846915 200', 'https://twitter.com/tiredarabwoman/status/1491515906199068678 200', 'https://twitter.com/tiredarabwoman/status/1491193051779563522 200', 'https://twitter.com/tiredarabwoman/status/1491498731585343491 200', 'https://twitter.com/tiredarabwoman/status/1490642306759864325 200', 'https://twitter.com/tiredarabwoman/status/1491454956984881156 200', 'https://twitter.com/tiredarabwoman/status/1491455552282853376 200']
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
