'Why is my flask app inserting the same data several times into postgres despite precautions?

I'm working on a hobby project - a chess calendar that finds strongest tournaments for me to participate in link to GitHub. It's a flask app that scrapes a tournament aggregator, finds the strongest tournaments and shows them to the visitor.

Everything works beautifully locally, but I've been having trouble with the version that I deploy on Heroku. How do I fix it?

Specifically, despite taking several precautions, my app ends up trying to insert the same data into postgres several times, which leads to a UniqueViolation error.

My precautions are the following:

  1. On startup, I truncate the tables in my db.
  2. I only insert the data if the tables are empty.
  3. The insertion happens inside a begin/commit/rollback block as advised in documentation.
  4. I drop duplicates in my data to make sure that it's not where the duplicates come from.

None of this helps though, the app still behaves like it's inserting the same data several times into the db and returning an error :(

I made sure to have the same versions of postgres both locally and on heroku.

(relevant part of) models.py:

db = SQLAlchemy()

def setup_db(app):
    database_name = 'tournaments'
    default_database_path = get_database_uri('jim_potato', os.environ.get("PASSWORD"), 'localhost:5432', database_name)
    database_path = os.getenv('DATABASE_URL', default_database_path)
    database_path = database_path.replace("postgres://", "postgresql://", 1)
    app.config["SQLALCHEMY_DATABASE_URI"] = database_path
    app.config["SQLALCHEMY_TRACK_MODIFICATIONS"] = False
    db.app = app
    db.init_app(app)
    db_truncate_or_create_all()


def db_truncate_or_create_all():
    inspector_gadget = inspect(db.engine)
    if inspector_gadget.has_table("player"):  # truncate the table if it's there
        db.session.query(Player).delete()
        db.session.query(Tournament).delete()
        db.session.commit()
    else:
        db.create_all()


class Tournament(db.Model):
    __tablename__ = 'tournament'
    id = Column(Integer, primary_key=True)
    title = Column(String(200))
    url = Column(String(80), unique=True)
    time_control = Column(String(80))
    status = Column(String(80))
    start_date = Column(db.Date)
    end_date = Column(db.Date)

(relevant part of) app.py:

def create_app(app_environment=None):
    app = Flask(__name__)
    if app_environment is None:
        app.config.from_object(config[os.getenv('FLASK_ENV', 'dev')])
    else:
        app.config.from_object(config[app_environment])
    setup_db(app)
    CORS(app)

    db.app.logger.info(f"rows in tournament: {db.session.query(Tournament).count()}")
    with Session(db.engine) as session, session.begin():
        if session.query(Tournament).first() is None:
            tournaments = pd.read_csv(TOURNAMENT_DATA_PATH).drop_duplicates(subset=['url'])
            players = pd.read_csv(FULL_DATA_PATH)
            for i, tournament_row in tournaments.iterrows():
                day, month = tournament_row.start.split('-')
                day, month = int(day), int(month)
                tournament = Tournament(
                    title=tournament_row["name"],
                    url=tournament_row["url"],
                    time_control=tournament_row["type"],
                    status=tournament_row["status"],
                    start_date=date(datetime.now().year, month, day),
                    end_date=date(datetime.now().year, month, day)
                )
                session.add(tournament)
                session.flush()

                for j, player in players[players['id'] == i].iterrows():
                    player = (Player(
                        tournament_id=tournament.id,
                        name=player['name'],
                        title=player['title'],
                        rating=player['rating'],
                        year_of_birth=player['year_of_birth']
                    ))
                    session.add(player)

error details (from heroku logs):

2022-03-25T14:45:23.014943+00:00 app[web.1]: Traceback (most recent call last):
2022-03-25T14:45:23.014944+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.9/site-packages/gunicorn/arbiter.py", line 589, in spawn_worker
2022-03-25T14:45:23.014944+00:00 app[web.1]: worker.init_process()
2022-03-25T14:45:23.014944+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.9/site-packages/gunicorn/workers/base.py", line 134, in init_process
2022-03-25T14:45:23.014945+00:00 app[web.1]: self.load_wsgi()
2022-03-25T14:45:23.014945+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.9/site-packages/gunicorn/workers/base.py", line 146, in load_wsgi
2022-03-25T14:45:23.014945+00:00 app[web.1]: self.wsgi = self.app.wsgi()
2022-03-25T14:45:23.014946+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.9/site-packages/gunicorn/app/base.py", line 67, in wsgi
2022-03-25T14:45:23.014946+00:00 app[web.1]: self.callable = self.load()
2022-03-25T14:45:23.014947+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.9/site-packages/gunicorn/app/wsgiapp.py", line 58, in load
2022-03-25T14:45:23.014947+00:00 app[web.1]: return self.load_wsgiapp()
2022-03-25T14:45:23.014947+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.9/site-packages/gunicorn/app/wsgiapp.py", line 48, in load_wsgiapp
2022-03-25T14:45:23.014948+00:00 app[web.1]: return util.import_app(self.app_uri)
2022-03-25T14:45:23.014948+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.9/site-packages/gunicorn/util.py", line 412, in import_app
2022-03-25T14:45:23.014948+00:00 app[web.1]: app = app(*args, **kwargs)
2022-03-25T14:45:23.014948+00:00 app[web.1]: File "/app/app.py", line 59, in create_app
2022-03-25T14:45:23.014949+00:00 app[web.1]: session.flush()
2022-03-25T14:45:23.014949+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 3363, in flush
2022-03-25T14:45:23.014949+00:00 app[web.1]: self._flush(objects)
2022-03-25T14:45:23.014950+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 3503, in _flush
2022-03-25T14:45:23.014950+00:00 app[web.1]: transaction.rollback(_capture_exception=True)
2022-03-25T14:45:23.014950+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.9/site-packages/sqlalchemy/util/langhelpers.py", line 70, in __exit__
2022-03-25T14:45:23.014951+00:00 app[web.1]: compat.raise_(
2022-03-25T14:45:23.014951+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 207, in raise_
2022-03-25T14:45:23.014951+00:00 app[web.1]: raise exception
2022-03-25T14:45:23.014952+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.9/site-packages/sqlalchemy/orm/session.py", line 3463, in _flush
2022-03-25T14:45:23.014952+00:00 app[web.1]: flush_context.execute()
2022-03-25T14:45:23.014952+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.9/site-packages/sqlalchemy/orm/unitofwork.py", line 456, in execute
2022-03-25T14:45:23.014952+00:00 app[web.1]: rec.execute(self)
2022-03-25T14:45:23.014953+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.9/site-packages/sqlalchemy/orm/unitofwork.py", line 630, in execute
2022-03-25T14:45:23.014953+00:00 app[web.1]: util.preloaded.orm_persistence.save_obj(
2022-03-25T14:45:23.014953+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.9/site-packages/sqlalchemy/orm/persistence.py", line 244, in save_obj
2022-03-25T14:45:23.014953+00:00 app[web.1]: _emit_insert_statements(
2022-03-25T14:45:23.014954+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.9/site-packages/sqlalchemy/orm/persistence.py", line 1221, in _emit_insert_statements
2022-03-25T14:45:23.014954+00:00 app[web.1]: result = connection._execute_20(
2022-03-25T14:45:23.014954+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1614, in _execute_20
2022-03-25T14:45:23.014954+00:00 app[web.1]: return meth(self, args_10style, kwargs_10style, execution_options)
2022-03-25T14:45:23.014955+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.9/site-packages/sqlalchemy/sql/elements.py", line 325, in _execute_on_connection
2022-03-25T14:45:23.014955+00:00 app[web.1]: return connection._execute_clauseelement(
2022-03-25T14:45:23.014955+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1481, in _execute_clauseelement
2022-03-25T14:45:23.014955+00:00 app[web.1]: ret = self._execute_context(
2022-03-25T14:45:23.014956+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1845, in _execute_context
2022-03-25T14:45:23.014956+00:00 app[web.1]: self._handle_dbapi_exception(
2022-03-25T14:45:23.014956+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 2026, in _handle_dbapi_exception
2022-03-25T14:45:23.014956+00:00 app[web.1]: util.raise_(
2022-03-25T14:45:23.014956+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.9/site-packages/sqlalchemy/util/compat.py", line 207, in raise_
2022-03-25T14:45:23.014957+00:00 app[web.1]: raise exception
2022-03-25T14:45:23.014957+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.9/site-packages/sqlalchemy/engine/base.py", line 1802, in _execute_context
2022-03-25T14:45:23.014957+00:00 app[web.1]: self.dialect.do_execute(
2022-03-25T14:45:23.014964+00:00 app[web.1]: File "/app/.heroku/python/lib/python3.9/site-packages/sqlalchemy/engine/default.py", line 732, in do_execute
2022-03-25T14:45:23.014964+00:00 app[web.1]: cursor.execute(statement, parameters)
2022-03-25T14:45:23.014965+00:00 app[web.1]: sqlalchemy.exc.IntegrityError: (psycopg2.errors.UniqueViolation) duplicate key value violates unique constraint "tournament_url_key"
2022-03-25T14:45:23.014965+00:00 app[web.1]: DETAIL:  Key (url)=(http://chessarbiter.com/turnieje/open.php?turn=2022/ti_93&n=) already exists.
2022-03-25T14:45:23.014965+00:00 app[web.1]: 
2022-03-25T14:45:23.014966+00:00 app[web.1]: [SQL: INSERT INTO tournament (title, url, time_control, status, start_date, end_date) VALUES (%(title)s, %(url)s, %(time_control)s, %(status)s, %(start_date)s, %(end_date)s) RETURNING tournament.id]
2022-03-25T14:45:23.014967+00:00 app[web.1]: [parameters: {'title': 'VI Memoriał Józefa Matwija - całoroczny turniej drabinkowy', 'url': 'http://chessarbiter.com/turnieje/open.php?turn=2022/ti_93&n=', 'time_control': 'klasyczne', 'status': 'trwający', 'start_date': datetime.date(2022, 1, 5), 'end_date': datetime.date(2022, 1, 5)}]
2022-03-25T14:45:23.014967+00:00 app[web.1]: (Background on this error at: https://sqlalche.me/e/14/gkpj)
2022-03-25T14:45:23.016112+00:00 app[web.1]: [2022-03-25 14:45:23 +0000] [10] [INFO] Worker exiting (pid: 10)
2022-03-25T14:45:53.480041+00:00 app[web.1]: [2022-03-25 14:45:53 +0000] [4] [INFO] Shutting down: Master
2022-03-25T14:45:53.480084+00:00 app[web.1]: [2022-03-25 14:45:53 +0000] [4] [INFO] Reason: Worker failed to boot.
2022-03-25T14:45:53.682139+00:00 heroku[web.1]: Process exited with status 3
2022-03-25T14:45:53.745500+00:00 heroku[web.1]: State changed from up to crashed


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source