'Fetch minimum value for each NTILE bucket in Hive

I am trying to partition the data into percentiles (100 equal buckets) using NTILE window function for each merchant_id ordered by score column. The output of the query will contain merchant_id, score, and percentile for every record in the source table. (Sample code below)

CREATE TABLE merchant_score_ntiles
AS
SELECT merchant_id, score, NTILE(100) OVER (PARTITION BY merchant_id ORDER BY score DESC) as percentile
FROM merch_table

This will return sample output as follows:

merchant_id,score,percentile
1001,900,1
1001,800,1
1001,760,1
1002,900,2
1002,800,2
1002,750,2

Is there a way we can return only the minimum score for each merchant_id based on percentile column such as below?

merchant_id,score,percentile
1001,760,1
1002,750,2



Solution 1:[1]

You can try to use ROW_NUMBER window function in subquery before using NTILE window function

SELECT merchant_id, 
       score, 
       NTILE(100) OVER (PARTITION BY merchant_id ORDER BY score DESC) as percentile
FROM (
    SELECT *,ROW_NUMBER() OVER(PARTITION BY merchant_id ORDER BY score) rn
    FROM merch_table
) t1
WHERE rn = 1

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1