'How can I filter for records whose sum make up X% of the total in Google Data Studio?
I have a data set containing clients and two dimensions: Lifetime and Revenue, e.g.
| Client | Lifetime | Revenue |
|---|---|---|
| Tesla | 7,4 | 280 |
| Amazon | 9,2 | 450 |
| Disney | 2,6 | 130 |
| Otto | 11,8 | 940 |
| BMW | 3,5 | 170 |
I am trying to calculate the average lifetime, but instead of calculating the average over all clients, I only want to include the top clients (regarding revenue) whose sum of revenue make up 80% of the total revenue.
When doing this manually the process would be quite clear:
- Sort the clients by revenue descending
- Calculate the total revenue and thus what 80% of the total is
- Check at which row the running sum of the client's revenue exceeds that 80% and only include the records up to that point
This is what the resulting table would look like:
| Client | Lifetime | Revenue | Running Sum of Rev |
|---|---|---|---|
| Otto | 11,8 | 940 | 940 |
| Amazon | 9,2 | 450 | 1390 |
| Tesla | 7,4 | 280 | 1670 |
| BMW | 3,5 | 170 | 1840 |
| Disney | 2,6 | 130 | 1970 |
Since the total revenue is 1970, 80% of that would amount to 1576. Therefore I would want to select the top 3 (Otto, Amazon, Tesla) from the set, because their running sum (1670) is just bigger than the 1576. Then I could calculate the average lifetime of those three clients only.
I have no idea if this is possible in an automated way in Google Data Studio. Filters can only take absolute values and in datasets/blends the order is also not regarded.
Alternative: I would already be quite happy with something like including only the top 80% clients from the list of clients sorted by revenue (i.e. the 80% quantile). This would be the top 4 in the example (0.8 * 5 = 4), instead of the top 3 as of the original question, but that seems equally impossible.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
