'for every dict in list of dict sort values by key

I have a large dataset stored in a dictionary, it consists of movies with keys like: titles, year, genre... I have them bucketized by genre as follows:

[
  {'Action': [
    {'title': 'They Live',
    'year': 1988,
    'genres': ['Action', 'Horror', 'Sci-Fi'],
    'duration': 94,
    'directors': ['John Carpenter'],
    'actors': ['Roddy Piper', 'Keith David', 'Meg Foster'],
    'rating': 7.3},
    {'title': 'Ultra Warrior',
    'year': 1990,
    'genres': ['Action', 'Adventure', 'Sci-Fi'],
    'duration': 81,
    'directors': ['Augusto Tamayo San Román', 'Kevin Tent'],
    'actors': ['Dack Rambo',
    'Clare Beresford',
    'Meshach Taylor',
    'Mark Bringelson'],
    'rating': 1.9},
    {'title': 'Kick-Ass 2',
    'year': 2013,
    'genres': ['Action', 'Comedy', 'Crime'],
    'duration': 103,
    'directors': ['Jeff Wadlow'],
    'actors': ['Aaron Taylor-Johnson', 'Chloë Grace Moretz'],
    'rating': 6.5},
    ....
    ]
  },
  {'Drama': [
    {'title': 'Dirty Beautiful',
    'year': 2015,
    'genres': ['Comedy', 'Drama', 'Romance'],
    'duration': 95,
    'directors': ['Tim Bartell'],
    'actors': ['Ricky Mabe', 'Jordan Monaghan', 'Conor Leslie', 'Darin Heames'],
    'rating': 5.5},
    {'title': 'Honeydripper',
    'year': 2007,
    'genres': ['Crime', 'Drama', 'History'],
    'duration': 124,
    'directors': ['John Sayles'],
    'actors': ['Danny Glover', 'LisaGay Hamilton', 'Yaya DaCosta'],
    'rating': 6.6},
    ....
    ]
  }
]
  • can't include the whole dataset here, this is just an example

How could I extract the median rating of movies from each genre? Once I extract it I'm trying to plot the median ratings in each genre as a bar graph... but I'm a bit stuck.

My median function is written already:

def median(items):
  itemss = sorted(items)
  if len(itemss) % 2 != 0:
    return itemss[len(itemss) // 2]
  else:
    return (itemss[len(itemss) // 2] + itemss[len(itemss) // 2 - 1]) / 2

Here's what I tried:

median_rating = {}

for genre in genre_buckets:
  median_rating[genre]= median(genre_buckets[genre], key = lambda x:x['rating'])

median_rating


Solution 1:[1]

First, I might modify the data structure to remove the outer list:

{Action: [{movie1}, {movie2}], Drama: [{movie1}, {movie2}]}

instead of:

[{Action: [{movie1}, {movie2}]}, {Drama: [{movie1}, {movie2}]}]

Here's how I would get the medians with that new data structure:

for genre in movieDictionary.items(): #.items() will return a list like [(genreName, [{movie1}, {movie2}]), ...]
    ratings = [movie['rating'] for movie in genre[1]] #This will create a list of all of the ratings
    genreMedian = median(ratings) #calls your function
    
    print(genre[0], genreMedian) #prints results

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1