'How to get the top 1% of soloists calculated by the total number of concerts performed
I'm working on this sql challenge, but I don't know if the way to get the top 1% of soloists calculated by the total number of concerts performed. is correct. Please help me, thank you. Below are the challenge and my code.
1. Orchestra Celebration
The New York Philharmonic is one of America's largest orchestras. In honor of its many famous musicians, the director is planning a special event. She wants to hold a concert to celebrate the top soloists from its history. You have been asked to determine which soloists should receive recognition.The director has given you the following requirements:
The results should only include the top 1% of soloists calculated by the total number of concerts performed.
Limit your results to entries where the orchestra is 'New York Philharmonic' and the event type is 'Subscription Season'. Soloist performances in other orchestras or event types should not count towards the calculation of the top soloists.
The director is only interested in individual soloists. You will need to exclude all of the following soloists: 'Chorus', 'No Soloist', 'New York Choral Artists', and 'Schola Cantorum of NY'. You should
also exclude any soloist with 'choir' in the name.
Your result should contain the following columns. It should meet all requirements as described.
column requirements
- name The name of the soloist, with the first name followed by the last name (e.g., Jane Smith). Please note that in the soloists table, names are in reverse order (last name, first name).
- first_date The first date the soloist ever performed with the orchestra, in the format '01 Jan 2015' (i.e., month as an integer, short month name, and year as an integer).
- last_date The last date the soloist ever performed with the orchestra, in the format '01 Jan 2015' (i.e., month as an integer, short month name, and year as an integer).
- total_concerts The total number of concerts the soloist performed.
Order your results by the total number of concerts performed in descending order, and then by soloist name in alphabetical order. The data you will need is available in the two tables detailed in the schema below. This is a picture of the schema This is the dataset
Below is my code and I don't know where I'm wrong, please help me, thank you all in advance
SELECT
concat(substring(soloist_name from position(',' in soloist_name)+1 for char_length(soloist_name)),
' ',
substring(soloist_name from 0 for position(',' in soloist_name))) as name,
To_char(min(date)::date, 'DD Mon YYYY') as first_date,
To_char(max(date)::date, 'DD Mon YYYY') as last_date,
count(concertnumber) as total_concerts
FROM soloists join concerts using (id)
WHERE
orchestra LIKE 'New York Philharmonic'
and eventtype LIKE 'Subscription Season'
and soloist_name NOT IN ('Chorus',
'No Soloist',
'New York Choral Artists',
'Schola Cantorum of NY')
and soloist_name NOT LIKE ('%Choir%')
GROUP BY name
ORDER BY total_concerts desc, name
limit 29
This is how I get 1% of the soloists (29 out of 2887), and this is where I think I am wrong, but don't know how to fix.
Solution 1:[1]
This is the solution that worked for me, I've used a subquery for the 1% limit because I don't like hardcoding the limit.
%%sql
postgresql:///orchestra
SELECT
CONCAT(TRIM(BOTH FROM SUBSTRING(s.soloist_name, STRPOS(s.soloist_name,',') + 2)),' ',TRIM(BOTH FROM SUBSTRING(s.soloist_name,0,STRPOS(s.soloist_name,',')))) AS name,
TO_CHAR(MIN(c.date)::date,'DD Mon YYYY') AS first_date,
TO_CHAR(MAX(c.date)::date,'DD Mon YYYY') AS last_date,
COUNT(s.id) AS total_concerts
FROM
soloists s
JOIN
concerts c
ON
s.id = c.id
WHERE
c.orchestra = 'New York Philharmonic'
AND
c.eventtype = 'Subscription Season'
AND
s.soloist_name NOT IN ('Chorus','No Soloist','New York Choral Artists','Schola Cantorum of NY')
AND
LOWER(s.soloist_name) NOT LIKE '%choir%'
GROUP BY
s.soloist_name
ORDER BY
total_concerts DESC,
name ASC
LIMIT(
SELECT(
COUNT(name)* 0.01)::integer
FROM(SELECT
s.soloist_name AS name,
COUNT(s.id) AS total_concerts
FROM
soloists s
JOIN
concerts c
ON
s.id = c.id
WHERE
c.orchestra = 'New York Philharmonic'
AND
c.eventtype = 'Subscription Season'
AND
s.soloist_name NOT IN ('Chorus','No Soloist','New York Choral Artists','Schola Cantorum of NY')
AND
LOWER(s.soloist_name) NOT LIKE '%choir%'
GROUP BY
s.soloist_name) AS s);
Let me know if it worked for you
Solution 2:[2]
As an example with the RANK windowing function :
WITH
T AS
(
SELECT
concat(substring(soloist_name from position(',' in soloist_name)+1 for char_length(soloist_name)),
' ',
substring(soloist_name from 0 for position(',' in soloist_name))) as name,
To_char(min(date)::date, 'DD Mon YYYY') as first_date,
To_char(max(date)::date, 'DD Mon YYYY') as last_date,
sum(concertnumber) as total_concerts,
RANK() OVER(ORDER BY sum(concertnumber) DESC) AS RANK_SOLOIST,
COUNT(*) AS NUMBER
FROM soloists join concerts using (id)
WHERE
orchestra LIKE 'New York Philharmonic'
and eventtype LIKE 'Subscription Season'
and soloist_name NOT IN ('Chorus',
'No Soloist',
'New York Choral Artists',
'Schola Cantorum of NY')
and soloist_name NOT LIKE ('%Choir%')
GROUP BY name
HAVING sum(concertnumber) <> 0
)
SELECT *
FROM T
WHERE RANK_SOLOIST <= NUMBER / 100.0
Solution 3:[3]
Just wanted to share my answer too. Alot of helped received from the responses by Gonzalo and SQLPro. Thx!
%%sql
postgresql:///orchestra
SELECT
CONCAT(TRIM(BOTH FROM SUBSTRING(s.soloist_name,STRPOS(s.soloist_name,',')+2)),
' ',
TRIM(BOTH FROM SUBSTRING(s.soloist_name,0,STRPOS(s.soloist_name,',')))) AS name,
TO_CHAR(MIN(c.date) :: DATE, 'dd Mon yyyy') AS first_date,
TO_CHAR(MAX(c.date) :: DATE, 'dd Mon yyyy') AS last_date,
COUNT(s.id) AS total_concerts
FROM concerts AS c
JOIN soloists AS s
ON c.id = s.id
WHERE c.orchestra = 'New York Philharmonic'
AND c.eventtype = 'Subscription Season'
AND NOT s.soloist_name = 'Chorus'
AND NOT s.soloist_name = 'No Soloist'
AND NOT s.soloist_name = 'New York Choral Artists'
AND NOT s.soloist_name = 'Schola Cantorum of NY'
AND NOT s.soloist_name ~* 'choir'
GROUP BY s.soloist_name
ORDER BY total_concerts DESC, name ASC
LIMIT(
SELECT CAST(COUNT(extra.name)*0.01 AS integer)
FROM(
SELECT
s.soloist_name AS name,
COUNT(s.id) AS total_concerts
FROM concerts AS c
JOIN soloists AS s
ON c.id = s.id
WHERE c.orchestra = 'New York Philharmonic'
AND c.eventtype = 'Subscription Season'
AND NOT s.soloist_name = 'Chorus'
AND NOT s.soloist_name = 'No Soloist'
AND NOT s.soloist_name = 'New York Choral Artists'
AND NOT s.soloist_name = 'Schola Cantorum of NY'
AND NOT s.soloist_name ~* 'choir'
GROUP BY s.soloist_name) AS extra);
Solution 4:[4]
This solution should work too. Slightly inspired by the previous answers.
%%sql
postgresql:///orchestra
SELECT
SPLIT_PART(soloist_name,',',2) || ' ' || SPLIT_PART(soloist_name,',',1) AS name,
TO_CHAR(MIN(date)::date, 'DD Mon YYYY') AS first_date,
TO_CHAR(MAX(date)::date, 'DD Mon YYYY') AS last_date,
COUNT(*) AS total_concerts
FROM concerts c
JOIN soloists s
ON c.id = s.id
WHERE c.orchestra = 'New York Philharmonic'
AND c.eventtype = 'Subscription Season'
AND s.soloist_name NOT IN ('Chorus','No Soloist','New York Choral Artists','Schola Cantorum of NY')
AND LOWER(s.soloist_name) NOT LIKE '%choir%'
GROUP BY soloist_name
ORDER BY total_concerts DESC, name ASC
LIMIT (
SELECT COUNT(*) / 100 as onepercent
FROM (
SELECT s.soloist_name
FROM concerts c
JOIN soloists s
ON c.id = s.id
WHERE c.orchestra = 'New York Philharmonic'
AND c.eventtype = 'Subscription Season'
AND s.soloist_name NOT IN ('Chorus','No Soloist','New York Choral Artists','Schola Cantorum of NY')
AND LOWER(s.soloist_name) NOT LIKE '%choir%'
GROUP BY soloist_name
) AS s
)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | SQLpro |
| Solution 3 | ripalo |
| Solution 4 | Gallawander |
