'Full outer join not giving the answer I need

I am using PostgreSQL and am having difficulty with getting a series of queries that combine the data from two tables (t1, t2)

t1 is

studyida gender age
a M 1
a M 2
a M 3
b F 4
b F 5
b F 6
c M 13
c M 14
c M 15

and t2 is

studyida studyidb gender age
a z M 3
a z M 4
a z M 5
NULL y F 7
NULL y F 8
NULL y F 9
c x M 10
c x M 11
c x M 12
NULL w F 7
NULL w F 8
NULL w F 9
NULL u M 7
NULL u M 8
NULL u M 9

t1 and t2 are related via StudyIDA and gender. What I need is a comprehensive listing from both tables, including including the ages. Sometimes the age in t1 equals the age in t2 (e.g. for StudyIDA=a, age=3) but most of the time it does not.

I am looking to create a table like this

StudyIDA StudyIDB gender ageA ageB
a z M 1
a z M 2
a z M 3 3
a z M 4
a z M 5
b NULL F 4
b NULL F 5
b NULL F 6
NULL y F 7
NULL y F 8
NULL y F 9
c x F 13
c x F 14
c x F 15
c x F 10
c x F 11
c x F 12
NULL w F 7
NULL w F 8
NULL w F 9
NULL u M 7
NULL u M 8
NULL u M 9

I was thinking that first a full outer join of t1 and t2 would give me what I want but it does not.

Then I was thinking I need a listing of all the individuals (lets call it t3), and then do a series of inserts (e.g. t1+t3 and also t1+t3) into a new table to 'construct' what I need. I am really stuck on the odd times when age in t1 equals the age in t2 (e.g. for StudyIDA=a, age=3).

I am still not getting what I need. Here is my code so far

DROP TABLE IF EXISTS t1, t2, t3;

CREATE TEMPORARY TABLE t1 (StudyIDA VARCHAR, gender VARCHAR, age int);
INSERT INTO t1 VALUES 
    ('a','M', 1),('a','M', 2),('a','M', 3),
    ('b','F', 4),('b','F', 5),('b','F', 6),
    ('c','M', 13),('c','M', 14),('c','M', 15);
SELECT * FROM t1;

CREATE TEMPORARY TABLE t2 (StudyIDA VARCHAR, StudyIDB varchar, gender VARCHAR, age int);
INSERT INTO t2 VALUES 
    ('a','z','M', 3), ('a','z','M', 4), ('a','z','M', 5),
    (NULL,'y','F', 7),(NULL,'y','F', 8),(NULL,'y','F', 9),
    ('c','x','M', 10),('c','x','M', 11),('c','x','M', 12),
    (NULL,'w','F', 7),(NULL,'w','F', 8),(NULL,'w','F', 9),
    (NULL,'u','M', 7),(NULL,'u','M', 8),(NULL,'u','M', 9);
SELECT * FROM t2;

CREATE TEMPORARY TABLE t3 (StudyIDA_t1 VARCHAR, gender_t1 VARCHAR, StudyIDA_t2 VARCHAR,StudyIDB varchar,
    gender_t2 VARCHAR);

INSERT INTO t3
    SELECT * FROM (SELECT DISTINCT StudyIDA, gender FROM t1) a FULL OUTER JOIN 
    (SELECT DISTINCT StudyIDA, StudyIDB, gender FROM t2) b 
    ON a.StudyIDA=b.StudyIDA AND a.gender=b.gender
    ORDER BY a.StudyIDA;

SELECT * FROM t3 ORDER BY StudyIDA_t1;


SELECT 'IN t1', * 
    FROM t3 JOIN t1 on t1.StudyIDA=t3.StudyIDA_t1 AND t1.gender=t3.gender_t1
    ORDER BY StudyIDA_t1, StudyIDB;

SELECT 'In t2',* 
    FROM t3 JOIN t2 on t3.StudyIDA_t1=t2.StudyIDA AND t3.gender_t1=t2.gender
    ORDER BY StudyIDA_t1, t3.StudyIDB;

DROP TABLE IF EXISTS t1, t2, t3;


Solution 1:[1]

Your sample data indicates that only t2.studyida can be NULL and all other columns should really be declared as NOT NULL.
If so, I suggest this simpler query:

SELECT studyida, b.studyidb, gender, age
     , CASE WHEN a.age IS NULL THEN 'b'
            WHEN b.age IS NULL THEN 'a'
                               ELSE 'a and b' END as source
FROM   t1 a
FULL   JOIN t2 b USING (studyida, gender, age)
ORDER  BY studyida, gender, age;

db<>fiddle here

The USING clause is convenient for identically named join columns. Only a single instance of the joining column is in the result set, effectively what COALESCE(a.col, b.col) gives you otherwise. (You might just use SELECT *.)
You can still reference source columns with table-qualification, like a.age.

I reduced to a single age column and added source. You may or may not want that.

Either way, "age" is subject to bitrot, almost always the wrong choice for a table column, and should typically be replaced by "birthday" or similar.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Erwin Brandstetter