'Redshift locks table very often
I have a table "A" which runs every hour on my redshift database.It runs from an orchestration tool AIRFLOW.
The process has certain inserts in the table so in few stages we are inserting some data In the table "A".The issue is that this table "A" is being used by lot of the people for ADHOC queries, so lot of select statements run.
So it happens that generally when its the time of inserting the data in the table "A" there are also lot of select statements trying to query it and hence creates a lock.
Please advice how I can prevent this lock so that it doesn't lock it that often.
Solution 1:[1]
Thanks for all the clarification. So you have 2 issues: 1) your ETL process stalls when the "old" table is still in use AND 2) some users are accessing out of date information. You will likely need to address both but let's dig into #1 first.
To solve #1 you just need to not depend on the locks existing on the old table version. The easiest way to do this is to just add a timestamp to the table name for the "swap" step. you are running a process something like:
- new data is put in a "new" table
- you are dropping "tableA_old" (from the previous run of the process)
- in a transaction you are renaming "tableA" to "tableA_old" and renaming "new" to "tableA"
- you are dropping "tableA_old" but this drop hangs until the table is no longer in use (locked)
The problem happens when step 4 from the previous run isn't complete when the nest iteration runs. When this happens step #3 fails because "tableA_old" already exists. So to solve this don't reuse the same name for "tableA_old". Use something like "tableA_20220216_22_old" (date plus hour).
Now this will stop the renaming of the table from failing but you COULD have many previous iterations hanging on the drop in step #4. This isn't a problem if it happens at a low level (1 or 2 hours of old versions hanging around) Redshift will keep track which sessions are using which versions of tableA and airflow can do this as well. But if you have days and days of old versions hanging around this can fill up disks and steal other system resources.
This brings us to issue #2 - how are user sessions being kept up to date with the changing version of tableA? There is no limit to how long a user can keep a transaction open and thus no limit on how long they can have a read lock active or how old the data is that they are accessing. There are configurations that can be set that can help - like "idle session timeout" but this doesn't directly ensure that locks cannot be kept a very long time. To achieve this usually takes a few tactics - idle timeouts, long session daemons (kill sessions that are really long lived like > day), configuring BI tools to issues COMMITs/ROLLBACKs, use of autocommit, and training of users.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Bill Weiner |
