'How to extract a series of data based on specific keys that multiple dicts in a list (PYTHON)
Requirements are:
- For each record of the file, there should be six main matrices: date, url, title, lang, jsonld, metatags. In the metatags, you can find keys and values. Please extract the data that contains any of the keys in the key_list.
matrix = ['date', 'url', 'title', 'lang', 'jsonld', 'metatags']
key_list = ['description', 'og:description','twitter:description', 'article:tag', 'sailthru.description','parsely-tags', 'abstract', 'og:title']
The key is inside the 'metatags'
The data looks like this (opens with json.loads):
sample 1 The data looks like this containing 'date', 'url', 'title', 'lang', 'jsonld', 'metatags'
sample 2 The data looks like this containing 'date', 'url', 'title', 'lang', 'jsonld', 'metatags'
sample 3 The data looks like this containing 'date', 'url', 'title', 'lang', 'jsonld', 'metatags'
- Based on the dataset generated from above, extract the values of the corresponding keys in the key list and filter out the data that contain any mentions of companies in the company_list. Generate sentiments for each article based on metatags.
Company list: ['Facebook', 'Amazon', 'Microsoft', 'Starbucks', 'McDonald's', 'Walmart', 'Tencent', 'Dutch Shell', 'Volkswagen', 'Apple']
#I have tried several ways:
for key_list in range(len(data0)):
for matrix in data0[i].items():
print(matrix)
#or
next(item for item in data0 if item.get(matrix) and item['metatags'] in key_list)
#Error: StopIteration
#or
def search(key, data0):
return [element for element in data0 if data0['key'] in key_list]
All doesn't work. Could anyone gives some help on these 2 requirements, please?
Really want to figure out how to reach these 2 requirements. They need to be achieved by PYTHON
Solution 1:[1]
In order to solve this, try to understand the data structure you are working with.
In your attempts, it seems you are struggling with obtaining the record dictionaries, so try to get to those first (currently, 90% of your question focusses on obtaining data from metatags of specific records, filtered on specific keys and further filtered on specific companies, all steps to solve later on).
What is the data0 variable that you use in your code? Based on the samples you provided, I'll assume it is a list with the record dictionaries as entries. In that case, can't you just loop over it like this?
for record in data0:
do work on the record dictionary
The further work to do on each record dictionary would be something like this:
- Get the list
metatagsinside yourrecorddictionary - Loop over each dictionary in your
metatagslist - If the
keyentry is inkey_list, store it in some data structure (up to you to decide what would work best for you) together with thevalueentry
Once you have all this data, I guess you have to loop over all entries you gathered and look at which value entries contain one of the companies you provided.
In order to not solve your homework, I think it is best that you focus on getting the records and try implementing the further steps yourself. It is better to come back if you deal with a specific issue (ie: a specific step in your problem) instead of posting the complete question. If my assumption about data0 is wrong, then please clarify.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Steven Robyns |
