'Display percentage of registered members that have not rated a Movie
I have the following three tables. See full db<>fiddle here
members
| member_id | first_name | last_name |
|---|---|---|
| 1 | Roby | Dauncey |
| 2 | Isa | Garfoot |
| 3 | Sullivan | Carletto |
| 4 | Jacintha | Beacock |
| 5 | Mikey | Keat |
| 6 | Cindy | Stenett |
| 7 | Alexina | Deary |
| 8 | Perkin | Bachmann |
| 10 | Suzann | Genery |
| 39 | Horatius | Baukham |
| 41 | Bendicty | Willisch |
movies
| movie_id | movie_name | movie_genre |
|---|---|---|
| 10 | The Bloody Olive | Comedy,Crime,Film-Noir |
| 56 | Attack of The Killer Tomatoes | (no genres listed) |
ratings
| rating_id | movie_id | member_id | rating |
|---|---|---|---|
| 19 | 10 | 39 | 2 |
| 10 | 56 | 41 | 1 |
Now the question is:
Out of the total number registered members, how many have actually left a movie rating? Display the result as a percentage
This is what I have tried:
SELECT CONVERT(VARCHAR,(CONVERT(FLOAT,COUNT([Number of Members])) / CONVERT(FLOAT,COUNT(*)) * 100)) + '%'
AS 'Members Percentage'
FROM (
SELECT COUNT(*) AS 'Number of Members'
FROM members
WHERE member_id IN (
SELECT member_id FROM members
EXCEPT
SELECT member_id FROM ratings
)
) MembersNORatings
And my query result is displaying as 100%. Which is obvious that the result is wrong.
| Members Percentage |
|---|
| 100% |
What I figured out was that in the first line of the query:
COUNT(*) value is being recognized as the value equivalent to the alias [Number of Members]. That's why it is showing 100%.
I thought of replacing COUNT(*) with SELECT COUNT(*) FROM members but before I try to run the query, it was showing error saying
Incorrect Syntax near SELECT.
What change do I need to make in my existing query in order to get the proper percentage result?
Solution 1:[1]
As mentioned in the comments, there are several ways to approach this. For example:
Option #1 - OUTER JOIN + DISTINCT
SELECT TotalMembers
, TotalMembersWithRatings
, CAST( 100.0 * TotalMembersWithRatings
/ NULLIF(TotalMembers, 0 )
AS DECIMAL(10,2)) AS MemberPercentage
FROM (
SELECT COUNT(DISTINCT m.member_id) AS TotalMembers
, COUNT(DISTINCT r.member_id) AS TotalMembersWithRatings
FROM members m LEFT JOIN ratings r ON r.member_id = m.member_id
) t
Option #2 - CTE + ROW_NUMBER()
WITH memberRatings AS (
SELECT member_id, ROW_NUMBER() OVER(
PARTITION BY member_id
ORDER BY member_id
) AS RowNum
FROM ratings
)
SELECT COUNT(mr.member_id) AS TotalMembers
, COUNT(mr.member_id) AS TotalWithRatings
, CAST( 100.0 * COUNT(mr.member_id)
/ NULLIF(COUNT(m.member_id), 0 )
AS DECIMAL(10,2)) AS MemberPercentage
FROM members m LEFT JOIN memberRatings mr ON mr.member_id = m.member_id
AND mr.RowNum = 1
SELECT
COUNT(*) TotalMembers
, SUM(r.HasRating) TotalWithRatings
, CONVERT(decimal(9,2), 100 * sum(r.HasRating) / (count(*) * 1.0)) PercentageWithRatings
FROM members m
CROSS APPLY (
SELECT CASE WHEN exists (select 1 from ratings r where r.member_id = m.member_id) THEN 1
ELSE 0
END
) r (HasRating);
Execution Plans - Take #1
There's a LOT more to analyzing execution plans than just comparing a single number. However, high level plans do provide some useful indicators.
With the small data samples provided, the plans suggest options #2 (CTE) and #3 (APPLY) are likely to be the most performant (19%), and option #1 (OUTER JOIN + DISTINCT) the least at (63%), likely due to the count(distinct) which can often be slower than alternative options.
Original Sample Size:
| TableName | TotalRows |
|---|---|
| movies | 50 |
| members | 50 |
| ratings | 50 |
Execution Plans - Take #2
However, populate the tables with more than a few sample rows of data and the same rough comparison produces a different result. Option #2 (CTE) still seems likely to be the least expensive query (9%), but Option #3 (APPLY) is now the most expensive (76%). You can see the majority of that cost is the index spool used due to how APPLY operates:
New Sample Size
| TableName | TotalRows |
|---|---|
| movies | 4105 |
| members | 29941 |
| ratings | 14866 |
New Execution Plans
With the increased amount of data, STATISTICS IO shows option #2 has far less logical reads and scans and option #3 (APPLY) which as has the most. While Option #1, which appears to have a lower cost overall (15%) it still has a much higher number of logical reads. (Add a non-clustered index on member_id and movie_id and the numbers, while similar, change once again.) So don't just look at a single number.
While overall, option #2 (CTE) would seem likely to be most efficient, there are a lot of factors involved (indexes, data volume, statistics, version, etc), so you should examine the actual execution plans in your own environment.
As with most things, the answer as to which is best is: it depends.
Solution 2:[2]
You can use a cross apply to determine using a sub-query whether a given member has left a rating or not (because you can't use a sub-query in an aggregation). Then divide (ensuring you use decimal division, not integer) to get the percentage.
select
count(*) TotalMembers
, sum(r.HasRating) TotalWithRatings
, convert(decimal(9,2), 100 * sum(r.HasRating) / (count(*) * 1.0)) PercentageWithRatings
from #members m
cross apply (
select case when exists (select 1 from #ratings r where r.member_id = m.member_id) then 1 else 0 end
) r (HasRating);
Returns:
| TotalMembers | TotalWithRatings | PercentageWithRatings |
|---|---|---|
| 50 | 2 | 4.00 |
Solution 3:[3]
Late to the party, but you don't need to join the tables if you only want to know how many members made a rating, not who.
What you need is
- count entries in members table
- count (distinct) members in ratings
- get quota of 'rating' members (rating members divided by total members)
- to get nonrating members, substract the quota from 1.0
- multiply with 100 to get the percent value
This is how you could do the calculation step by step using CTEs:
with count_members as (
select count(member_id) as member_count from members
), count_raters as (
select count(distinct member_id) as rater_count from ratings
), convert_both as (
select top 1
cast(m.member_count as decimal(10,2)) as member_count,
cast(r.rater_count as decimal(10,2)) as rater_count
from count_members as m cross join count_raters as r
), calculate_quota as (
select (rater_count / member_count) as quota from convert_both
), invert_quota as (
select (1.0 - quota) as quota from calculate_quota
)
select (quota * 100) as percentage from invert_quota;
Alternatively, that's how you could roll it all into one:
select (
(1.0 - (
cast((select count(distinct member_id) from ratings) as decimal(10,2))
/
cast((select count(member_id) from members) as decimal(10,2))
) ) * 100
) as percentage;
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | |
| Solution 3 |



