'MySQL group and select only first from each group [duplicate]

I would like to perform a query that will select only the most recent item from a given group.


In this example, I'm tracking vans:

  • Each time they return to base, a check-in is recorded with information - mileage, etc...
  • Each time they make a delivery, a delivery is recorded - customer, etc...

This table lets us know the history for a given van. The data can be produced with a query or stored as we go - this isn't the problem.

 id | checkin_id | delivery_id | van_id
----+------------+-------------+--------
 24 | 15         | NULL        | 3
 25 | NULL       | 28          | 3
 26 | 16         | NULL        | 4
 27 | NULL       | 29          | 3
 28 | NULL       | 30          | 4
 29 | 17         | NULL        | 5

I can see the van's history by querying with ... WHERE van_id=3; - fine.

Conversely, I would like to be able to get a list of vans with their most recent "event". Resulting in this:

 id | checkin_id | delivery_id | van_id
----+------------+-------------+--------
 27 | NULL       | 29          | 3
 28 | NULL       | 30          | 4
 29 | 17         | NULL        | 5

I jumped to the following query:

SELECT * FROM `history` GROUP BY `van_id`;

But this returns the following error:

#1055 - Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column 'database.history.checkin_id' which is not functionally dependent on columns in GROUP BY clause; this is incompatible with sql_mode=only_full_group_by

After reading up, I understand what this is about and have to admit that my SQL is somewhat out of date - which of the items from the group do I want returned?

Adding checkin_id and delivery_id to the GROUP BY just shifts the problem - Ultimately I end up with the same set of data, just sorted differently.


This answer piqued my interest, and the graphic really helps to clearly outline the problem, thanks @azerafati!

I want to use the FIRST() or LAST() aggregate function - but MySQL doesn't appear to have them.

How do I reproduce this behaviour without processing all of the data in my application?



Solution 1:[1]

I guess your id values are unique, and later records have higher values than earlier records.

You need to use a subquery that gets the latest id for each van:

         SELECT MAX(id) id, van_id
           FROM history
          GROUP BY van_id

Then join that to your detail query.

 SELECT h.*
   FROM history h
   JOIN (
         SELECT MAX(id) id, van_id
           FROM history
          GROUP BY van_id
        ) m ON h.id = m.id AND h.van_id = m.van_id

But because your id values are unique you can simplify this even more.

 SELECT h.*
   FROM history h
   JOIN (
         SELECT MAX(id) id
           FROM history
          GROUP BY van_id
        ) m ON h.id = m.id

Solution 2:[2]

I was going to mark this as a duplicate because the question is actually asked fairly frequently, but I found those question/answers seem fairly hard to search for; so here is the generic template:

SELECT t.*
FROM theTable AS t
INNER JOIN (
   SELECT groupingValue, MIN(someValue) AS lowestValue
   FROM theTable
   GROUP BY groupingValue
) AS rIdent ON rIdent.groupingValue = t.groupingValue AND rIdent.lowestValue= t.someValue

lowest in your particular case being the min(id)...oh, oops; your question says first, but the detail says most recent (which I would interpret as last), so just use MAX instead of MIN. ...and the "groupingValue" is van_id.

Edit: The query should be fairly efficient if there is an index on the grouping fields and the field used to identify lowest/first/highest/recent .

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 O. Jones
Solution 2