'for every dict in list of dict sort values by key
I have a large dataset stored in a dictionary, it consists of movies with keys like: titles, year, genre... I have them bucketized by genre as follows:
[
{'Action': [
{'title': 'They Live',
'year': 1988,
'genres': ['Action', 'Horror', 'Sci-Fi'],
'duration': 94,
'directors': ['John Carpenter'],
'actors': ['Roddy Piper', 'Keith David', 'Meg Foster'],
'rating': 7.3},
{'title': 'Ultra Warrior',
'year': 1990,
'genres': ['Action', 'Adventure', 'Sci-Fi'],
'duration': 81,
'directors': ['Augusto Tamayo San Román', 'Kevin Tent'],
'actors': ['Dack Rambo',
'Clare Beresford',
'Meshach Taylor',
'Mark Bringelson'],
'rating': 1.9},
{'title': 'Kick-Ass 2',
'year': 2013,
'genres': ['Action', 'Comedy', 'Crime'],
'duration': 103,
'directors': ['Jeff Wadlow'],
'actors': ['Aaron Taylor-Johnson', 'Chloë Grace Moretz'],
'rating': 6.5},
....
]
},
{'Drama': [
{'title': 'Dirty Beautiful',
'year': 2015,
'genres': ['Comedy', 'Drama', 'Romance'],
'duration': 95,
'directors': ['Tim Bartell'],
'actors': ['Ricky Mabe', 'Jordan Monaghan', 'Conor Leslie', 'Darin Heames'],
'rating': 5.5},
{'title': 'Honeydripper',
'year': 2007,
'genres': ['Crime', 'Drama', 'History'],
'duration': 124,
'directors': ['John Sayles'],
'actors': ['Danny Glover', 'LisaGay Hamilton', 'Yaya DaCosta'],
'rating': 6.6},
....
]
}
]
- can't include the whole dataset here, this is just an example
How could I extract the median rating of movies from each genre? Once I extract it I'm trying to plot the median ratings in each genre as a bar graph... but I'm a bit stuck.
My median function is written already:
def median(items):
itemss = sorted(items)
if len(itemss) % 2 != 0:
return itemss[len(itemss) // 2]
else:
return (itemss[len(itemss) // 2] + itemss[len(itemss) // 2 - 1]) / 2
Here's what I tried:
median_rating = {}
for genre in genre_buckets:
median_rating[genre]= median(genre_buckets[genre], key = lambda x:x['rating'])
median_rating
Solution 1:[1]
First, I might modify the data structure to remove the outer list:
{Action: [{movie1}, {movie2}], Drama: [{movie1}, {movie2}]}
instead of:
[{Action: [{movie1}, {movie2}]}, {Drama: [{movie1}, {movie2}]}]
Here's how I would get the medians with that new data structure:
for genre in movieDictionary.items(): #.items() will return a list like [(genreName, [{movie1}, {movie2}]), ...]
ratings = [movie['rating'] for movie in genre[1]] #This will create a list of all of the ratings
genreMedian = median(ratings) #calls your function
print(genre[0], genreMedian) #prints results
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
