'How to use NTILE function to create two groups for two columns where one is a nested NTILE of the other?

I have financial data that I want to categorise (using NTILE()) by two columns that contain percentage values (risk, debt_to_assets). My end goal is that I want to aggregate some other column (profit) by the two categories, but that's not related to my issue at hand. The data looks something like this:

profit risk debt_to_assets
7000 0.10 0.20
1000 0.40 0.70
3000 0.15 0.50
4000 0.30 0.30
2000 0.20 0.60

The issue I'm trying to solve is that I want the categories to be nested such that, in addition to the population distribution being uniform, the categories of the inner quantile are consistent across the categories of the outer quantile in terms of the range that defines the categories (i.e. I want the min and max value for the inner categories (x0, y0), (x1, y0), (x2, y0), ... to all be the same or as close as possible, where the x's are the outer category and the y's are the inner category).

Ideally if I were to aggregate the columns used for the NTILE() function (using 3 inner categories and 3 outer categories for example) I'd want a table that resembles the following:

risk_cat dta_cat min_risk max_risk min_dta max_dta count
1 1 0.00 0.33 0.00 0.33 100
1 2 0.00 0.33 0.34 0.67 100
1 3 0.00 0.33 0.68 1.00 100
2 1 0.34 0.67 0.00 0.33 100
2 2 0.34 0.67 0.34 0.67 100
2 3 0.34 0.67 0.68 1.00 100
3 1 0.68 1.00 0.00 0.33 100
3 2 0.68 1.00 0.34 0.67 100
3 3 0.68 1.00 0.68 1.00 100

These are the solutions I've tried but they only solve part of the issue, not the whole thing:

SELECT *,
    NTILE(3) OVER (
        ORDER BY risk
    ) AS risk_cat,
    NTILE(3) OVER (
        ORDER BY debt_to_assets
    ) AS dta_cat
FROM my_table

This would result in an aggregated table like this:

risk_cat dta_cat min_risk max_risk min_dta max_dta count
1 1 0.00 0.33 0.00 0.33 10
1 2 0.00 0.33 0.34 0.67 55
1 3 0.00 0.33 0.68 1.00 180
2 1 0.34 0.67 0.00 0.33 135
2 2 0.34 0.67 0.34 0.67 140
2 3 0.34 0.67 0.68 1.00 100
3 1 0.68 1.00 0.00 0.33 130
3 2 0.68 1.00 0.34 0.67 110
3 3 0.68 1.00 0.68 1.00 40

The problem is that the count across the two categories isn't uniform.

WITH outer_cat AS (
    SELECT *,
        NTILE(3) OVER (
            ORDER BY risk
        ) AS risk_cat
    FROM my_table
)
SELECT *,
    NTILE(3) OVER(
        PARTITION BY risk_cat
        ORDER BY debt_to_assets
    ) AS dta_cat
FROM outer_cat

The aggregated table for this might resemble the following:

risk_cat dta_cat min_risk max_risk min_dta max_dta count
1 1 0.00 0.33 0.10 0.70 100
1 2 0.00 0.33 0.71 0.90 100
1 3 0.00 0.33 0.91 1.00 100
2 1 0.34 0.67 0.05 0.35 100
2 2 0.34 0.67 0.36 0.60 100
2 3 0.34 0.67 0.61 0.90 100
3 1 0.68 1.00 0.00 0.25 100
3 2 0.68 1.00 0.26 0.50 100
3 3 0.68 1.00 0.51 0.80 100

The problem this time is that the min and max values for the inner category vary to much across the outer category.

SELECT *,
    NTILE(9) OVER(
        ORDER BY risk, debt_to_assets
    ) AS dual_cat
FROM my_table

The aggregated table for this looks something like the following:

dual_cat min_risk max_risk min_dta max_dta count
1 0.00 0.11 0.55 1.00 100
2 0.12 0.22 0.35 1.00 100
3 0.23 0.33 0.15 1.00 100
4 0.34 0.44 0.40 1.00 100
5 0.45 0.55 0.10 1.00 100
6 0.56 0.66 0.10 0.95 100
7 0.67 0.77 0.05 1.00 100
8 0.78 0.88 0.20 1.00 100
9 0.89 1.00 0.00 1.00 100

This was just a last attempt at a solution after the previous two didn't work. This attempt didn't capture any of the behaviour that I was looking for.

Is there a solution to my problem that I'm not seeing?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source