'UPDATE PostgreSQL table with values from self
I am attempting to update multiple columns on a table with values from another row in the same table:
CREATE TEMP TABLE person (
pid INT
, name VARCHAR(40)
, dob DATE
, younger_sibling_name VARCHAR(40)
, younger_sibling_dob DATE
);
INSERT INTO person VALUES (pid, name, dob)
(1, 'John' , '1980-01-05')
, (2, 'Jimmy', '1975-04-25')
, (3, 'Sarah', '2004-02-10')
, (4, 'Frank', '1934-12-12')
;
The task is to populate younger_sibling_name
and younger_sibling_dob
with the name and birthday of the person that is closest to them in age, but not older or the same age.
I can set the younger sibling dob
easily because this is the value that determines the record to use with a correlated subquery (I think this is an example of that?):
UPDATE person SET younger_sibling_dob = (
SELECT MAX(dob)
FROM person AS sibling
WHERE sibling.dob < person.dob);
I just can't see any way to get the name
?
The real query of this will run over about 1M rows in groups of 100-500 for each MAX selection so performance is a concern.
Edit
After trying many different approaches, I've decided on this one which I think is a good balance of being able to verify the data with the intermediate result, shows the intention of what the logic is, and performs adequately:
WITH sibling AS (
SELECT person.pid, sibling.dob, sibling.name,
row_number() OVER (PARTITION BY person.pid
ORDER BY sibling.dob DESC) AS age_closeness
FROM person
JOIN person AS sibling ON sibling.dob < person.dob
)
UPDATE person
SET younger_sibling_name = sibling.name
,younger_sibling_dob = sibling.dob
FROM sibling
WHERE person.pid = sibling.pid
AND sibling.age_closeness = 1;
SELECT * FROM person ORDER BY dob;
Solution 1:[1]
1) Finding the MAX() can alway be rewritten in terms of NOT EXISTS (...)
UPDATE person dst
SET younger_sibling_name = src.name
,younger_sibling_dob = src.dob
FROM person src
WHERE src.dob < dst.dob
OR src.dob = dst.dob AND src.pid < dst.pid
AND NOT EXISTS (
SELECT * FROM person nx
WHERE nx.dob < dst.dob
OR nx.dob = dst.dob AND nx.pid < dst.pid
AND nx.dob > src.dob
OR nx.dob = src.dob AND nx.pid > src.pid
);
2) Instead of rank() / row_number(), you could also use a LAG() function over the WINDOW:
UPDATE person dst
SET younger_sibling_name = src.name
,younger_sibling_dob = src.dob
FROM (
SELECT pid
, LAG(name) OVER win AS name
, LAG(dob) OVER win AS dob
FROM person
WINDOW win AS (ORDER BY dob, pid)
) src
WHERE src.pid = dst.pid
;
Both versions require a self-joined subquery (or CTE) because UPDATE does not allow window functions.
Solution 2:[2]
To get the dob and name, you can do:
update person
set younger_sibling_dob = (select dob
from person p2
where s.dob < person.dob
order by dob desc
limit 1),
younger_sibling_name = (select name
from person p2
where s.dob < person.dob
order by dob desc
limit 1)
If you have an index on dob
, then the query will run faster.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | Gordon Linoff |