'How to filter list of dictionaries in python?

I have a list of dictionaries which is as follow-

VehicleList = [
        {
            'id': '1',
            'VehicleType': 'Car',
            'CreationDate': datetime.datetime(2021, 12, 10, 16, 9, 44, 872000)
        },
        {
            'id': '2',
            'VehicleType': 'Bike',
            'CreationDate': datetime.datetime(2021, 12, 15, 11, 8, 21, 612000)
        },
        {
            'id': '3',
            'VehicleType': 'Truck',
            'CreationDate': datetime.datetime(2021, 9, 13, 10, 1, 50, 350095)
        },
        {
            'id': '4',
            'VehicleType': 'Bike',
            'CreationDate': datetime.datetime(2021, 12, 10, 21, 1, 00, 300012)
        },
        {
            'id': '5',
            'VehicleType': 'Car',
            'CreationDate': datetime.datetime(2021, 12, 21, 10, 1, 50, 600095)
        }
    ]

How can I get a list of the latest vehicles for each 'VehicleType' based on their 'CreationDate'?

I expect something like this-

latestVehicles = [
        {
            'id': '5',
            'VehicleType': 'Car',
            'CreationDate': datetime.datetime(2021, 12, 21, 10, 1, 50, 600095)
        },
        {
            'id': '2',
            'VehicleType': 'Bike',
            'CreationDate': datetime.datetime(2021, 12, 15, 11, 8, 21, 612000)
        },
        {
            'id': '3',
            'VehicleType': 'Truck',
            'CreationDate': datetime.datetime(2021, 9, 13, 10, 1, 50, 350095)
        }
    ]

I tried separating out each dictionary based on their 'VehicleType' into different lists and then picking up the latest one.

I believe there might be a more optimal way to do this.



Solution 1:[1]

Here is a solution using max and filter:

VehicleLatest = [
    max(
        filter(lambda _: _["VehicleType"] == t, VehicleList), 
        key=lambda _: _["CreationDate"]
    ) for t in {_["VehicleType"] for _ in VehicleList}
]

Result

print(VehicleLatest)
# [{'id': '2', 'VehicleType': 'Bike', 'CreationDate': datetime.datetime(2021, 12, 15, 11, 8, 21, 612000)}, {'id': '3', 'VehicleType': 'Truck', 'CreationDate': datetime.datetime(2021, 9, 13, 10, 1, 50, 350095)}, {'id': '5', 'VehicleType': 'Car', 'CreationDate': datetime.datetime(2021, 12, 21, 10, 1, 50, 600095)}]

Solution 2:[2]

I think you can acheive what you want using the groupby function from itertools.

from itertools import groupby

# entries sorted according to the key we wish to groupby: 'VehicleType'
VehicleList = sorted(VehicleList, key=lambda x: x["VehicleType"])

latestVehicles = []

# Then the elements are grouped.
for k, v in groupby(VehicleList, lambda x: x["VehicleType"]):
    # We then append to latestVehicles the 0th entry of the
    # grouped elements after sorting according to the 'CreationDate'
    latestVehicles.append(sorted(list(v), key=lambda x: x["CreationDate"], reverse=True)[0])

Solution 3:[3]

Sort by 'VehicleType' and 'CreationDate', then create a dictionary from 'VehicleType' and vehicle to get the latest vehicle for each type:

VehicleList.sort(key=lambda x: (x.get('VehicleType'), x.get('CreationDate')))
out = list(dict(zip([item.get('VehicleType') for item in VehicleList], VehicleList)).values())

Output:

[{'id': '2',
  'VehicleType': 'Bike',
  'CreationDate': datetime.datetime(2021, 12, 15, 11, 8, 21, 612000)},
 {'id': '5',
  'VehicleType': 'Car',
  'CreationDate': datetime.datetime(2021, 12, 21, 10, 1, 50, 600095)},
 {'id': '3',
  'VehicleType': 'Truck',
  'CreationDate': datetime.datetime(2021, 9, 13, 10, 1, 50, 350095)}]

Solution 4:[4]

This is very straightforwards in pandas. First load the list of dicts as a pandas dataframe, then sort the values by date, take the top n items (3 in the example below), and export to dict.

import pandas as pd

df = pd.DataFrame(VehicleList)
df.sort_values('CreationDate', ascending=False).head(3).to_dict(orient='records')

Solution 5:[5]

You can use the operator to achieve that goal:

import operator
my_sorted_list_by_type_and_date = sorted(VehicleList, key=operator.itemgetter('VehicleType', 'CreationDate'))

Solution 6:[6]

A small plea for more readable code:

from operator import itemgetter
from itertools import groupby

vtkey = itemgetter('VehicleType')
cdkey = itemgetter('CreationDate')

latest = [
    # Get latest from each group.
    max(vs, key = cdkey)
    # Sort and group by VehicleType.
    for g, vs in groupby(sorted(vehicles, key = vtkey), vtkey)
]

Solution 7:[7]

A variation on Blckknght's answer using defaultdict to avoid the long if condition:

from collections import defaultdict
import datetime
from operator import itemgetter

latest_dict = defaultdict(lambda: {'CreationDate': datetime.datetime.min})

for vehicle in VehicleList:
    t = vehicle['VehicleType']
    latest_dict[t] = max(vehicle, latest_dict[t], key=itemgetter('CreationDate'))

latestVehicles = list(latest_dict.values())

latestVehicles:

[{'id': '5', 'VehicleType': 'Car', 'CreationDate': datetime.datetime(2021, 12, 21, 10, 1, 50, 600095)},
 {'id': '2', 'VehicleType': 'Bike', 'CreationDate': datetime.datetime(2021, 12, 15, 11, 8, 21, 612000)},
 {'id': '3', 'VehicleType': 'Truck', 'CreationDate': datetime.datetime(2021, 9, 13, 10, 1, 50, 350095)}]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Benjamin Rowell
Solution 3
Solution 4 RJ Adriaansen
Solution 5 Charley
Solution 6 FMc
Solution 7