'Fetch minimum value for each NTILE bucket in Hive
I am trying to partition the data into percentiles (100 equal buckets) using NTILE window function for each merchant_id ordered by score column. The output of the query will contain merchant_id, score, and percentile for every record in the source table. (Sample code below)
CREATE TABLE merchant_score_ntiles
AS
SELECT merchant_id, score, NTILE(100) OVER (PARTITION BY merchant_id ORDER BY score DESC) as percentile
FROM merch_table
This will return sample output as follows:
merchant_id,score,percentile
1001,900,1
1001,800,1
1001,760,1
1002,900,2
1002,800,2
1002,750,2
Is there a way we can return only the minimum score for each merchant_id based on percentile column such as below?
merchant_id,score,percentile
1001,760,1
1002,750,2
Solution 1:[1]
You can try to use ROW_NUMBER window function in subquery before using NTILE window function
SELECT merchant_id,
score,
NTILE(100) OVER (PARTITION BY merchant_id ORDER BY score DESC) as percentile
FROM (
SELECT *,ROW_NUMBER() OVER(PARTITION BY merchant_id ORDER BY score) rn
FROM merch_table
) t1
WHERE rn = 1
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
