'python/httpx/asyncio: httpx.RemoteProtocolError: Server disconnected without sending a response
I am attempting to optimize a simple web scraper that I made. It gets a list of urls from a table on a main page and then goes to each of those "sub" urls and gets information from those pages. I was able to successfully write it synchronously and using concurrent.futures.ThreadPoolExecutor(). However, I am trying to optimize it to use asyncio and httpx as these seem to be very fast for making hundreds of http requests.
I wrote the following script using asyncio and httpx however, I keep getting the following errors:
httpcore.RemoteProtocolError: Server disconnected without sending a response.
RuntimeError: The connection pool was closed while 4 HTTP requests/responses were still in-flight.
It appears that I keep losing connection when I run the script. I even attempted running a synchronous version of it and get the same error. I was thinking that the remote server was blocking my requests, however, I am able to run my original program and go to each of the urls from the same IP address without issue.
What would cause this exception and how do you fix it?
import httpx
import asyncio
async def get_response(client, url):
resp = await client.get(url, headers=random_user_agent()) # Gets a random user agent.
html = resp.text
return html
async def main():
async with httpx.AsyncClient() as client:
tasks = []
# Get list of urls to parse.
urls = get_events('https://main-url-to-parse.com')
# Get the responses for the detail page for each event
for url in urls:
tasks.append(asyncio.ensure_future(get_response(client, url)))
detail_responses = await asyncio.gather(*tasks)
for resp in detail_responses:
event = get_details(resp) # Parse url and get desired info
asyncio.run(main())
Solution 1:[1]
I've had a same issue, the problem occurs when there is an exception in one of the asyncio.gather tasks, when it's raised, it causes httpxclient to call __ aexit __ and cancel all the current requests, you could bypass it by using return_exceptions=True for asyncio.gather:
async def main():
async with httpx.AsyncClient() as client:
tasks = []
# Get list of urls to parse.
urls = get_events('https://main-url-to-parse.com')
# Get the responses for the detail page for each event
for url in urls:
tasks.append(asyncio.ensure_future(get_response(client, url)))
detail_responses = await asyncio.gather(*tasks, return_exceptions=True)
for resp in detail_responses:
# here you would need to do smth with the exceptions
# if isinstance(resp, Exception): ...
event = get_details(resp) # Parse url and get desired info
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
