'UPDATE PostgreSQL table with values from self

I am attempting to update multiple columns on a table with values from another row in the same table:

CREATE TEMP TABLE person (
  pid INT
, name VARCHAR(40)
, dob DATE
, younger_sibling_name VARCHAR(40)
, younger_sibling_dob DATE
);

INSERT INTO person VALUES (pid, name, dob)
  (1, 'John' , '1980-01-05')
, (2, 'Jimmy', '1975-04-25')
, (3, 'Sarah', '2004-02-10')
, (4, 'Frank', '1934-12-12')
;

The task is to populate younger_sibling_name and younger_sibling_dob with the name and birthday of the person that is closest to them in age, but not older or the same age.

I can set the younger sibling dob easily because this is the value that determines the record to use with a correlated subquery (I think this is an example of that?):

UPDATE person SET younger_sibling_dob = (
SELECT MAX(dob)
FROM person AS sibling
WHERE sibling.dob < person.dob);

I just can't see any way to get the name?
The real query of this will run over about 1M rows in groups of 100-500 for each MAX selection so performance is a concern.

Edit

After trying many different approaches, I've decided on this one which I think is a good balance of being able to verify the data with the intermediate result, shows the intention of what the logic is, and performs adequately:

WITH sibling AS (
  SELECT person.pid, sibling.dob, sibling.name,
         row_number() OVER (PARTITION BY person.pid
                            ORDER BY sibling.dob DESC) AS age_closeness
  FROM person
  JOIN person AS sibling ON sibling.dob < person.dob
)
UPDATE person
  SET younger_sibling_name = sibling.name
     ,younger_sibling_dob  = sibling.dob
FROM sibling
WHERE person.pid = sibling.pid
   AND sibling.age_closeness = 1;

SELECT * FROM person ORDER BY dob;


Solution 1:[1]

1) Finding the MAX() can alway be rewritten in terms of NOT EXISTS (...)

UPDATE person dst
SET younger_sibling_name = src.name
        ,younger_sibling_dob = src.dob
FROM person src
WHERE src.dob < dst.dob
   OR src.dob = dst.dob AND src.pid < dst.pid
AND NOT EXISTS (
        SELECT * FROM person nx
        WHERE nx.dob < dst.dob
           OR nx.dob = dst.dob AND nx.pid < dst.pid
        AND nx.dob > src.dob
           OR nx.dob = src.dob AND nx.pid > src.pid
        );

2) Instead of rank() / row_number(), you could also use a LAG() function over the WINDOW:

UPDATE person dst
SET younger_sibling_name = src.name
        ,younger_sibling_dob = src.dob
FROM    (
        SELECT pid
        , LAG(name) OVER win AS name
        , LAG(dob) OVER win AS dob 
        FROM person
        WINDOW win AS (ORDER BY dob, pid)
        ) src
WHERE src.pid = dst.pid
        ;

Both versions require a self-joined subquery (or CTE) because UPDATE does not allow window functions.

Solution 2:[2]

To get the dob and name, you can do:

update person
    set younger_sibling_dob = (select dob
                               from person p2
                               where s.dob < person.dob
                               order by dob desc
                               limit 1),
       younger_sibling_name = (select name
                               from person p2
                               where s.dob < person.dob
                               order by dob desc
                               limit 1)

If you have an index on dob, then the query will run faster.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Gordon Linoff