Lock propagation in Postgres
Leo Sjöberg • August 9, 2024
We recently encountered the error Deadlock detected (SQLSTATE 40P01)
in some of our code at work, and it took us ages to figure out what had gone wrong. I’m writing this to share the non-obvious ways in which postgres can lock a row for you, and how to avoid it.
At incident.io, we rely on our internal machine
library, which is based on gocardless/statesman
. We use this to track a bunch of different changes to entities, to ensure we have a historical record of changes made. In one of these processes, we managed to run into a deadlock, from running these seemingly innocuous queries in two processes (P1 meaning process 1, P2 meaning process 2):
1P1: SELECT * FROM escalations WHERE id = 'xyz' FOR UPDATE2P2: UPDATE escalation_transitions SET most_recent = false WHERE escalation_id = 'xyz' and most_recent3P1: UPDATE escalation_transitions SET most_recent = false WHERE escalation_id = 'xyz' and most_recent4P1: LOCKWAIT (trying to run the same query as P2, which holds a lock)5P2: INSERT INTO escalation_transitions (escalation_id, to_state, most_recent) VALUES ('xyz', 'new_state', true)6P2: DEADLOCK [killed]
This really confused me when I first saw it. We’re getting a deadlock on an INSERT
statement. A deadlock implies process 1 depends on process 2 and vice versa, but for process 2 to depend on process 1, it would have to depend on the escalation that P1 acquired a lock on, but it’s just an insert!
This confused me for a really long time. As it turns out, when you run an INSERT
statement, Postgres will acquire a lock on all entities to which you have foreign key constraints.
That is to say, because we’re inserting escalation_id
into our escalation_transitions
table, we acquire a lock against escalations
!
This was surprising to me, but what’s the lock? And importantly, why?
Understanding foreign key reference checks
When you insert a row that holds a foreign key, Postgres needs to verify that the key you’re inserting is valid. To do that, Postgres will acquire a lock FOR KEY SHARE
, but what does this lock do?
A KEY SHARE
lock is telling Postgres “you can update this row, but not any keys that may be referenced”. In Postgres, foreign keys must reference some unique column (primary key or otherwise), so when you take a FOR KEY SHARE
lock, you allow other processes to update the row, as long as they don’t touch any unique columns.
That means another process can run an UPDATE
query (but not DELETE
), as long as it doesn’t modify row-unique keys. And this makes sense - in order to verify that the foreign key is valid, Postgres has to ensure that, by the end of the transaction, the key is still valid, so no other transaction is allowed to change it.
How our deadlock happened
In our case, this lock propagation caused a deadlock. Remember, the first thing that P1
does is acquire a lock by calling SELECT FOR UPDATE
. This places a lock on the row in escalations
, and allows the process that locked it to make any modifications, including changing the primary key. This means that no other process can take any lock, not even a key share lock, against the row.
So let’s go through the steps again, and see where we hit a deadlock:
- Process 1 executes
SELECT * FROM escalations WHERE id = 'xyz' FOR UPDATE
- That row in
escalations
now has aFOR UPDATE
lock on it. Anyone who wants to do anything relating to the escalation will have to wait for the transaction to finish!
- That row in
- Process 2 executes
UPDATE escalation_transitions SET most_recent = false WHERE escalation_id = 'xyz' and most_recent
- This acquires a
FOR UPDATE
lock onescalation_transitions
against all rows matching the constraint within this transaction
- This acquires a
- Process 1 starts executing
UPDATE escalation_transitions SET most_recent = false WHERE escalation_id = 'xyz' and most_recent
- This attempts to acquire the same
FOR UPDATE
lock that Process 2 does, but it’s already locked
- This attempts to acquire the same
- Process 1 enters a
LOCKWAIT
, waiting for Process 2 to release its lock on the rows inescalation_transitions
- Process 2 starts executing
INSERT INTO escalation_transitions (escalation_id, to_state, most_recent) VALUES ('xyz', 'new_state', true)
- This attempts to acquire a
FOR KEY SHARE
lock against the row inescalations
that is referenced in theINSERT
statement.
- This attempts to acquire a
- 💥
DEADLOCK [killed]
We’re now in a deadlock!- Process 1 is waiting for Process 2 to release its lock on
escalation_transitions
, while Process 2 is waiting for Process 1 to release its lock onescalations
.
- Process 1 is waiting for Process 2 to release its lock on
Fixing it
In our case, we realised we wanted these two processes to never run concurrently, and so we simply modified Process 2 to start with the same SELECT FOR UPDATE
as Process 1. That means any time this happens in the future, Process 2 will effectively wait (through LOCKWAIT
) for all of Process 1 to finish before attempting to run any queries.
More generally, however, throwing FOR UPDATE
s in all over is a bad idea, as it reduces your ability to concurrently handle work. Instead, if you need to lock against a resource, you should use a FOR NO KEY UPDATE
lock. This will allow other processes to acquire a FOR KEY SHARE
lock, but nothing else. It means you’re safe from other processes updating the row, but ensures that you can still insert new rows in other tables that reference the row you’re locking on.