'Simple fact-dim relationship lead to wrong averages
I'd like to answer a simple question: What is the avarage age of persons that own cars of specific colors ? ( Using Penatho BI, Yellowfin BI, MariaDB)
To do that the following model was designed. One fact table "person_fact" containing the columns:
id, name and age.
And a dimension table "auto_dim" containing the columns:
id ( foreign key to person_fact table), car_type, color and hp.
As we see one person can own more than one car!
The problem is that in some filter scenarios, the persons that own several cars are counted multiple times, where they shouldn't.
For instance, trying to calcualte the avarage age of persons that own a "Black" or a "Red" car leads to a wrong result !!!
The SQL query that is generated by Pentaho BI is the following:
select
avg(`person_fact`.`age`) as `m0`
from `auto_dim` as `auto_dim`,
`person_fact` as `person_fact`
where `person_fact`.`id` = `auto_dim`.`id`
and `auto_dim`.`color` in ('Black', 'Red');
The same problem occurs using Yellowfin BI. The SQL query generated by Yellowfin BI:
SELECT DISTINCT AVG(`PERSON_FACT`.`age`)
FROM `person_fact` AS `PERSON_FACT`
INNER JOIN `auto_dim` AS `AUTO_DIM`
ON (`PERSON_FACT`.`id` = `AUTO_DIM`.`id`)
WHERE (`AUTO_DIM`.`color` IN ('Black', 'Red'))
The correct answer is 28.25 !
NOTE: Calculating the avg age with excel yields the right answer using a DAX expression!!

Question:
- Is there any possibility to use something like DAX expression for Pentaho BI/Yellowfin BI to get right avg ?
- Should i use a another model to fix the problem ?
Thanks in advance !!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|



