'postgresql selecting the most representative value

I have a table in which objects have ids and they have names. The ids are correct by definition, the names are almost always correct, but sometimes dirty incoming data causes names to be null or even wrong.

So I do a query like

SELECT id, name, AGGR1(a) as a, AGGR2(b) as b, AGGR3(c) as c
FROM my_table
WHERE d = 3
GROUP BY id

I'd like to have name in the results, but of course the above is wrong. I'd have to group on id, name, in which case what should be one row sometimes becomes more than one -- say, id 2 has names 'John' (correct), 'Jon' (no, but only 1%), or NULL (also a small fraction).

Is there a construct or idiom in postgresql that lets me select what a human looking at the list would say is obviously the consensus name?

(I hear our postgres installation is finally being upgraded soon, if that matters here.)

sample output, in case prose wasn't clear

SELECT id, name, COUNT(id) as c
FROM my_table
WHERE d = 3
GROUP BY id

id    name       c

2     John    2000
2     Jon        3
2     (NULL)     5

id    name       c

2     John    2008

postgresql postgresql-9.5

Solution 1:^[1]

You can get the names with

WITH names as (
  SELECT 
    id, 
    name, 
    ROW_NUMBER() OVER (PARTITION BY id ORDER BY COUNT(1) DESC) as rn
  FROM my_table
  GROUP BY id, name
)
SELECT id, name
FROM names
WHERE rn=1;

and then do your calculations by id only, joining names from this query.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Vesa Karjalainen

'postgresql selecting the most representative value

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]