'Full outer join not giving the answer I need

I am using PostgreSQL and am having difficulty with getting a series of queries that combine the data from two tables (t1, t2)

t1 is

studyida	gender	age
a	M	1
a	M	2
a	M	3
b	F	4
b	F	5
b	F	6
c	M	13
c	M	14
c	M	15

and t2 is

studyida	studyidb	gender	age
a	z	M	3
a	z	M	4
a	z	M	5
NULL	y	F	7
NULL	y	F	8
NULL	y	F	9
c	x	M	10
c	x	M	11
c	x	M	12
NULL	w	F	7
NULL	w	F	8
NULL	w	F	9
NULL	u	M	7
NULL	u	M	8
NULL	u	M	9

t1 and t2 are related via StudyIDA and gender. What I need is a comprehensive listing from both tables, including including the ages. Sometimes the age in t1 equals the age in t2 (e.g. for StudyIDA=a, age=3) but most of the time it does not.

I am looking to create a table like this

StudyIDA	StudyIDB	gender	ageA	ageB
a	z	M	1
a	z	M	2
a	z	M	3	3
a	z	M		4
a	z	M		5
b	NULL	F	4
b	NULL	F	5
b	NULL	F	6
NULL	y	F		7
NULL	y	F		8
NULL	y	F		9
c	x	F	13
c	x	F	14
c	x	F	15
c	x	F		10
c	x	F		11
c	x	F		12
NULL	w	F		7
NULL	w	F		8
NULL	w	F		9
NULL	u	M		7
NULL	u	M		8
NULL	u	M		9

I was thinking that first a full outer join of t1 and t2 would give me what I want but it does not.

Then I was thinking I need a listing of all the individuals (lets call it t3), and then do a series of inserts (e.g. t1+t3 and also t1+t3) into a new table to 'construct' what I need. I am really stuck on the odd times when age in t1 equals the age in t2 (e.g. for StudyIDA=a, age=3).

I am still not getting what I need. Here is my code so far

DROP TABLE IF EXISTS t1, t2, t3;

CREATE TEMPORARY TABLE t1 (StudyIDA VARCHAR, gender VARCHAR, age int);
INSERT INTO t1 VALUES 
    ('a','M', 1),('a','M', 2),('a','M', 3),
    ('b','F', 4),('b','F', 5),('b','F', 6),
    ('c','M', 13),('c','M', 14),('c','M', 15);
SELECT * FROM t1;

CREATE TEMPORARY TABLE t2 (StudyIDA VARCHAR, StudyIDB varchar, gender VARCHAR, age int);
INSERT INTO t2 VALUES 
    ('a','z','M', 3), ('a','z','M', 4), ('a','z','M', 5),
    (NULL,'y','F', 7),(NULL,'y','F', 8),(NULL,'y','F', 9),
    ('c','x','M', 10),('c','x','M', 11),('c','x','M', 12),
    (NULL,'w','F', 7),(NULL,'w','F', 8),(NULL,'w','F', 9),
    (NULL,'u','M', 7),(NULL,'u','M', 8),(NULL,'u','M', 9);
SELECT * FROM t2;

CREATE TEMPORARY TABLE t3 (StudyIDA_t1 VARCHAR, gender_t1 VARCHAR, StudyIDA_t2 VARCHAR,StudyIDB varchar,
    gender_t2 VARCHAR);

INSERT INTO t3
    SELECT * FROM (SELECT DISTINCT StudyIDA, gender FROM t1) a FULL OUTER JOIN 
    (SELECT DISTINCT StudyIDA, StudyIDB, gender FROM t2) b 
    ON a.StudyIDA=b.StudyIDA AND a.gender=b.gender
    ORDER BY a.StudyIDA;

SELECT * FROM t3 ORDER BY StudyIDA_t1;


SELECT 'IN t1', * 
    FROM t3 JOIN t1 on t1.StudyIDA=t3.StudyIDA_t1 AND t1.gender=t3.gender_t1
    ORDER BY StudyIDA_t1, StudyIDB;

SELECT 'In t2',* 
    FROM t3 JOIN t2 on t3.StudyIDA_t1=t2.StudyIDA AND t3.gender_t1=t2.gender
    ORDER BY StudyIDA_t1, t3.StudyIDB;

DROP TABLE IF EXISTS t1, t2, t3;

Solution 1:^[1]

Your sample data indicates that only t2.studyida can be NULL and all other columns should really be declared as NOT NULL.
If so, I suggest this simpler query:

SELECT studyida, b.studyidb, gender, age
     , CASE WHEN a.age IS NULL THEN 'b'
            WHEN b.age IS NULL THEN 'a'
                               ELSE 'a and b' END as source
FROM   t1 a
FULL   JOIN t2 b USING (studyida, gender, age)
ORDER  BY studyida, gender, age;

db<>fiddle here

The USING clause is convenient for identically named join columns. Only a single instance of the joining column is in the result set, effectively what COALESCE(a.col, b.col) gives you otherwise. (You might just use SELECT *.)
You can still reference source columns with table-qualification, like a.age.

I reduced to a single age column and added source. You may or may not want that.

Either way, "age" is subject to bitrot, almost always the wrong choice for a table column, and should typically be replaced by "birthday" or similar.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Erwin Brandstetter

'Full outer join not giving the answer I need

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]