'Write a query to add a flag to each user if they've added a new employer in the current year
employers table
column type
user_id integer
year string
employer_ein string
We're given a table called employers that consists of a user_id, year, and employer EIN label. Users can have multiple employers dictated by the different EIN labels. Write a query to add a flag to each user if they've added a new employer in the current year.
Example:
employer
user_id year employer_ein
# 34323 2018 A
# 34323 2018 B
# 34323 2018 C
# 34323 2017 F
# 34323 2017 A
# 34323 2017 B
# 86323 2018 A
# 86323 2018 B
# 86323 2018 C
# 86323 2017 B
#
# 98787 2018 A
# 98787 2018 B
# 98787 2018 F
# 98787 2017 F
# 98787 2017 B
# 98787 2017 A
#
# 55559 2018 A
# 55559 2018 B
# 55559 2018 C
# Output
# user_id year new_ein_flag
# 34323 2018 1
# 86323 2018 1
# 98787 2018 0
Solution 1:[1]
You can do this using window functions and aggregation:
select user_id, year,
max(case when min_year = year then 1 else 0 end) as new_ein_flag
from (select t.*,
min(year) over (partition by user_id, ein) as min_year
from t
) t
where year = 2018
group by user_id, year;
The inner query calculates the minimum year per user_id/ein pair. The outer query only keeps rows from the current year and calculates the flag if any eins have their minimum as the current year.
EDIT:
If you have to figure out the most recent year, just use window functions again:
select user_id, year,
max(case when min_year = year then 1 else 0 end) as new_ein_flag
from (select t.*, max(year) over () as max_year,
min(year) over (partition by user_id, ein) as min_year
from t
) t
where year = max_year
group by user_id, year;
Solution 2:[2]
select user_id,max(year) as year, max(sumrow) as maxsumrow,
(case min(sumrow) = 1 and max(sumrow) = 2 when TRUE then 1 when FALSE then 0 end) as new_ein_flag
from
(SELECT user_id,count(*) as sumrow, max(year) as year,employer_ein FROM `ein` GROUP BY user_id,employer_ein) as grouptable
where year = (select max(year) as maxyear from ein)
group by user_id
having max(sumrow)=2
I tried it. it is screenshot;
Solution 3:[3]
if you're tired from the analytical functions
WITH
max_date_for_cross_join AS (
SELECT MAX(dt) AS m_dt FROM employer ),
stg AS (
SELECT user_id AS user_id
, MAX(s.m_dt) AS year
, CASE WHEN MIN(e.dt)=MAX(e.dt) AND MAX(e.dt) = MAX(s.m_dt) THEN 1 ELSE 0 END AS new_ein_flag
, MIN(e.dt) AS min_dt
FROM employer e
CROSS JOIN
max_date_for_cross_join s
GROUP BY user_id
, employer_ein)
SELECT user_id AS user_id
, MAX(year) AS year
, MAX(new_ein_flag) AS new_ein_flag
FROM stg
GROUP BY user_id
HAVING SUM(CASE WHEN min_dt < year THEN 1 ELSE 0 END) >= 1
ORDER BY 1
Solution 4:[4]
With res AS(
select A.User_id, A.year, count(A.employer_ein) as cnt
from Users A left join Users B on A.User_id=B.User_id and A.year=B.year+1 and A.employer_ein=B.employer_ein
where B.user_id is null and A.year=2018 GROUP by A.User_id,A.year )
select distinct case when R.user_id is null then U.user_id when R.user_id is not null then R.user_id end as userid,
case when R.year is null then U.year when R.year is not null then R.year end as year,
case when R.cnt is null then 0 when cnt>0 then 1 end as new_ein_flag from res R right join Users U on R.user_id=U.user_id
where U.year=2018
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | |
| Solution 3 | |
| Solution 4 | HuDa M |

