'Django throws "connection to database already close" error on Scrapy + Celery task
Context
I have a Django application running inside a Docker container. This application uses Celery and Celery-beat for async and scheduled tasks. One of those tasks scrapes texts from different webs using Scrapy. This task runs every minute looking for new texts on the pages. If there is new information, it creates a new object in MyModel. This logic (querying the database to check if data exists, and create the object or update the info) is performed by a custom Scrapy item pipeline.
Issue
When using Development environment (using locally Docker Compose to turn on one container for the app, one container for PostgreSQL plus other services containers) everything runs smoothly. However, when using Stage environment (one Docker container for the app on a DigitalOcean droplet and a PostgreSQL self-managed cluster) the tasks throws this error:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/django/db/backends/base/base.py", line 237, in _cursor
return self._prepare_cursor(self.create_cursor(name))
File "/usr/local/lib/python3.8/site-packages/django/utils/asyncio.py", line 26, in inner
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/django/db/backends/postgresql/base.py", line 236, in create_cursor
cursor = self.connection.cursor()
psycopg2.InterfaceError: connection already closed
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.8/site-packages/twisted/internet/defer.py", line 857, in _runCallbacks
current.result = callback( # type: ignore[misc]
File "/usr/local/lib/python3.8/site-packages/scrapy/utils/defer.py", line 150, in f
return deferred_from_coro(coro_f(*coro_args, **coro_kwargs))
File "/sites/app/services/scraper/news/pipelines.py", line 145, in process_item
existing_article = check_if_existing_article(item)
File "/sites/app/services/scraper/news/pipelines.py", line 124, in check_if_existing_article
if ProcessedArticle.objects.filter(
File "/usr/local/lib/python3.8/site-packages/django/db/models/query.py", line 809, in exists
return self.query.has_results(using=self.db)
File "/usr/local/lib/python3.8/site-packages/django/db/models/sql/query.py", line 537, in has_results
return compiler.has_results()
File "/usr/local/lib/python3.8/site-packages/django/db/models/sql/compiler.py", line 1126, in has_results
return bool(self.execute_sql(SINGLE))
File "/usr/local/lib/python3.8/site-packages/django/db/models/sql/compiler.py", line 1154, in execute_sql
cursor = self.connection.cursor()
File "/usr/local/lib/python3.8/site-packages/django/utils/asyncio.py", line 26, in inner
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/django/db/backends/base/base.py", line 259, in cursor
return self._cursor()
File "/usr/local/lib/python3.8/site-packages/django/db/backends/base/base.py", line 237, in _cursor
return self._prepare_cursor(self.create_cursor(name))
File "/usr/local/lib/python3.8/site-packages/django/db/utils.py", line 90, in __exit__
raise dj_exc_value.with_traceback(traceback) from exc_value
File "/usr/local/lib/python3.8/site-packages/django/db/backends/base/base.py", line 237, in _cursor
return self._prepare_cursor(self.create_cursor(name))
File "/usr/local/lib/python3.8/site-packages/django/utils/asyncio.py", line 26, in inner
return func(*args, **kwargs)
File "/usr/local/lib/python3.8/site-packages/django/db/backends/postgresql/base.py", line 236, in create_cursor
cursor = self.connection.cursor()
django.db.utils.InterfaceError: connection already closed
Additional info
- The connection to DB seems stable because there is not other issues in the whole app, and we only use one cluster
- If I place in the same script
db.connections.close_all()before the queries, it works for a couple of minutes, but fails later. - Once it fails once, it continues failing all the times.
Any clue?
Thanks in advance!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
