'How to get the top 1% of soloists calculated by the total number of concerts performed

I'm working on this sql challenge, but I don't know if the way to get the top 1% of soloists calculated by the total number of concerts performed. is correct. Please help me, thank you. Below are the challenge and my code.

1. Orchestra Celebration

The New York Philharmonic is one of America's largest orchestras. In honor of its many famous musicians, the director is planning a special event. She wants to hold a concert to celebrate the top soloists from its history. You have been asked to determine which soloists should receive recognition.The director has given you the following requirements:

  • The results should only include the top 1% of soloists calculated by the total number of concerts performed.

  • Limit your results to entries where the orchestra is 'New York Philharmonic' and the event type is 'Subscription Season'. Soloist performances in other orchestras or event types should not count towards the calculation of the top soloists.

  • The director is only interested in individual soloists. You will need to exclude all of the following soloists: 'Chorus', 'No Soloist', 'New York Choral Artists', and 'Schola Cantorum of NY'. You should
    also exclude any soloist with 'choir' in the name.

Your result should contain the following columns. It should meet all requirements as described.

column requirements

  1. name The name of the soloist, with the first name followed by the last name (e.g., Jane Smith). Please note that in the soloists table, names are in reverse order (last name, first name).
  2. first_date The first date the soloist ever performed with the orchestra, in the format '01 Jan 2015' (i.e., month as an integer, short month name, and year as an integer).
  3. last_date The last date the soloist ever performed with the orchestra, in the format '01 Jan 2015' (i.e., month as an integer, short month name, and year as an integer).
  4. total_concerts The total number of concerts the soloist performed.

Order your results by the total number of concerts performed in descending order, and then by soloist name in alphabetical order. The data you will need is available in the two tables detailed in the schema below. This is a picture of the schema This is the dataset

Below is my code and I don't know where I'm wrong, please help me, thank you all in advance

SELECT 
    concat(substring(soloist_name from position(',' in soloist_name)+1 for char_length(soloist_name)),
           ' ', 
           substring(soloist_name from 0 for position(',' in soloist_name))) as name,
    To_char(min(date)::date, 'DD Mon YYYY') as first_date,
    To_char(max(date)::date, 'DD Mon YYYY') as last_date,
    count(concertnumber) as total_concerts
     
FROM soloists join concerts using (id)
WHERE 
    orchestra LIKE 'New York Philharmonic' 
    and eventtype LIKE 'Subscription Season' 
    and soloist_name NOT IN ('Chorus', 
                             'No Soloist', 
                             'New York Choral Artists', 
                             'Schola Cantorum of NY')
    and soloist_name NOT LIKE ('%Choir%')
  
GROUP BY name
ORDER BY total_concerts desc, name
limit 29

This is how I get 1% of the soloists (29 out of 2887), and this is where I think I am wrong, but don't know how to fix.



Solution 1:[1]

This is the solution that worked for me, I've used a subquery for the 1% limit because I don't like hardcoding the limit.

%%sql
postgresql:///orchestra
    
SELECT
    CONCAT(TRIM(BOTH FROM SUBSTRING(s.soloist_name, STRPOS(s.soloist_name,',') + 2)),' ',TRIM(BOTH FROM SUBSTRING(s.soloist_name,0,STRPOS(s.soloist_name,',')))) AS name,
    TO_CHAR(MIN(c.date)::date,'DD Mon YYYY') AS first_date,
    TO_CHAR(MAX(c.date)::date,'DD Mon YYYY') AS last_date,
    COUNT(s.id) AS total_concerts
FROM
    soloists s
JOIN
    concerts c
ON
    s.id = c.id
WHERE
    c.orchestra = 'New York Philharmonic'
AND
    c.eventtype = 'Subscription Season'
AND
    s.soloist_name NOT IN ('Chorus','No Soloist','New York Choral Artists','Schola Cantorum of NY')
AND
    LOWER(s.soloist_name) NOT LIKE '%choir%'
GROUP BY
    s.soloist_name
ORDER BY
    total_concerts DESC,
    name ASC
LIMIT(
    SELECT(
        COUNT(name)* 0.01)::integer
        FROM(SELECT
                s.soloist_name AS name,
                COUNT(s.id) AS total_concerts
             FROM
                soloists s
             JOIN
                concerts c
             ON
                s.id = c.id
             WHERE
                c.orchestra = 'New York Philharmonic'
             AND
                c.eventtype = 'Subscription Season'
             AND
                s.soloist_name NOT IN ('Chorus','No Soloist','New York Choral Artists','Schola Cantorum of NY')
             AND
                LOWER(s.soloist_name) NOT LIKE '%choir%'
             GROUP BY
                s.soloist_name) AS s);

Let me know if it worked for you

Solution 2:[2]

As an example with the RANK windowing function :

WITH 
T AS
(
SELECT 
    concat(substring(soloist_name from position(',' in soloist_name)+1 for char_length(soloist_name)),
           ' ', 
           substring(soloist_name from 0 for position(',' in soloist_name))) as name,
    To_char(min(date)::date, 'DD Mon YYYY') as first_date,
    To_char(max(date)::date, 'DD Mon YYYY') as last_date,
    sum(concertnumber) as total_concerts,
    RANK() OVER(ORDER BY sum(concertnumber) DESC) AS RANK_SOLOIST,
    COUNT(*) AS NUMBER
     
FROM soloists join concerts using (id)
WHERE 
    orchestra LIKE 'New York Philharmonic' 
    and eventtype LIKE 'Subscription Season' 
    and soloist_name NOT IN ('Chorus', 
                             'No Soloist', 
                             'New York Choral Artists', 
                             'Schola Cantorum of NY')
    and soloist_name NOT LIKE ('%Choir%')
  
GROUP BY name
HAVING sum(concertnumber) <> 0
)
SELECT *
FROM   T
WHERE  RANK_SOLOIST <= NUMBER / 100.0

Solution 3:[3]

Just wanted to share my answer too. Alot of helped received from the responses by Gonzalo and SQLPro. Thx!

%%sql
postgresql:///orchestra
    
SELECT
    CONCAT(TRIM(BOTH FROM SUBSTRING(s.soloist_name,STRPOS(s.soloist_name,',')+2)),
           ' ',
           TRIM(BOTH FROM SUBSTRING(s.soloist_name,0,STRPOS(s.soloist_name,',')))) AS name,
    TO_CHAR(MIN(c.date) :: DATE, 'dd Mon yyyy') AS first_date,
    TO_CHAR(MAX(c.date) :: DATE, 'dd Mon yyyy') AS last_date,
    COUNT(s.id) AS total_concerts
FROM concerts AS c
JOIN soloists AS s
ON c.id = s.id
WHERE c.orchestra = 'New York Philharmonic'
    AND c.eventtype = 'Subscription Season'
    AND NOT s.soloist_name = 'Chorus'
    AND NOT s.soloist_name = 'No Soloist'
    AND NOT s.soloist_name = 'New York Choral Artists'
    AND NOT s.soloist_name = 'Schola Cantorum of NY'
    AND NOT s.soloist_name ~* 'choir'
GROUP BY s.soloist_name
ORDER BY total_concerts DESC, name ASC
LIMIT(
    SELECT CAST(COUNT(extra.name)*0.01 AS integer)
    FROM(
        SELECT 
            s.soloist_name AS name,
            COUNT(s.id) AS total_concerts
        FROM concerts AS c
        JOIN soloists AS s
        ON c.id = s.id
        WHERE c.orchestra = 'New York Philharmonic'
            AND c.eventtype = 'Subscription Season'
            AND NOT s.soloist_name = 'Chorus'
            AND NOT s.soloist_name = 'No Soloist'
            AND NOT s.soloist_name = 'New York Choral Artists'
            AND NOT s.soloist_name = 'Schola Cantorum of NY'
            AND NOT s.soloist_name ~* 'choir'
        GROUP BY s.soloist_name) AS extra);

Solution 4:[4]

This solution should work too. Slightly inspired by the previous answers.

%%sql
postgresql:///orchestra
    
SELECT
    SPLIT_PART(soloist_name,',',2) || ' ' || SPLIT_PART(soloist_name,',',1) AS name,
    TO_CHAR(MIN(date)::date, 'DD Mon YYYY') AS first_date,
    TO_CHAR(MAX(date)::date, 'DD Mon YYYY') AS last_date,
    COUNT(*) AS total_concerts
FROM concerts c
JOIN soloists s
ON c.id = s.id
WHERE c.orchestra = 'New York Philharmonic'
AND c.eventtype = 'Subscription Season'
AND s.soloist_name NOT IN ('Chorus','No Soloist','New York Choral Artists','Schola Cantorum of NY')
AND LOWER(s.soloist_name) NOT LIKE '%choir%'
GROUP BY soloist_name
ORDER BY total_concerts DESC, name ASC
LIMIT (
    SELECT COUNT(*) / 100 as onepercent
    FROM (
        SELECT s.soloist_name
        FROM concerts c
        JOIN soloists s
        ON c.id = s.id
        WHERE c.orchestra = 'New York Philharmonic'
        AND c.eventtype = 'Subscription Season'
        AND s.soloist_name NOT IN ('Chorus','No Soloist','New York Choral Artists','Schola Cantorum of NY')
        AND LOWER(s.soloist_name) NOT LIKE '%choir%'
        GROUP BY soloist_name
    ) AS s
)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 SQLpro
Solution 3 ripalo
Solution 4 Gallawander