'azure sql transformation for titanic kaggle

i am trying to impute rows with missing age data using the apply sql transformation in azure ML studio.

i did a simple group by function to determine median age by pclass and sex

select t1.pclass, t1.sex, median(t1.age) 
from t1
group by t1.pclass, t1.sex
;

how do i bring this group by results for median age over to my train.csv so that i can impute rows with missing age values with the median age based on their pclass and gender classification? not quite sure how to continue the sql codes as it is my first attempt learning sql.

select t1.*, median(t1.age) as age_revised
from t1
where t1.pclass = 1 and lower(t1.sex) = 'male'
;


Solution 1:[1]

You could calculate the medians in a subquery and use it as a table, and in a JOIN provide it to the fields that do not have it, and import the data. For example if they are NULL.

SELECT t1.ID, t1.Otros, t1.pclass, t1.sex, ISNULL(t1.age, t2.age) 
FROM t1
INNER JOIN
    (SELECT pclass, sex, median(age)
    FROM t1 WHERE age IS NOT NULL
    GROUP BY pclass, sex) as t2
ON t1.pclass = t2.pclass AND t1.sex = t2.sex

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Jeremy Caney