'Key error when accessing a nested dictionary
I have the following list of nested dictionaries:
raw_data = [
{
"type": "message",
"subtype": "bot_message",
"text": "This content can't be displayed.",
"timestamp": "1650905606.755969",
"username": "admin",
"bot_id": "BPD4K3SJW",
"blocks": [
{
"type": "section",
"block_id": "BJNTn",
"text": {
"type": "mrkdwn",
"text": "You have a new message.",
"verbatim": False,
},
},
{
"type": "section",
"block_id": "WPn/l",
"text": {
"type": "mrkdwn",
"text": "*Heard By*\nFriend",
"verbatim": False,
},
},
{
"type": "section",
"block_id": "5yp",
"text": {
"type": "mrkdwn",
"text": "*Which Direction? *\nNorth",
"verbatim": False,
},
},
{
"type": "section",
"block_id": "fKEpF",
"text": {
"type": "mrkdwn",
"text": "*Which Destination*\nNew York",
"verbatim": False,
},
},
{
"type": "section",
"block_id": "qjAH",
"text": {
"type": "mrkdwn",
"text": "*New Customer:*\Yes",
"verbatim": False,
},
},
# problem code chunk below
{
"type": "actions",
"block_id": "yt4",
"elements": [
{
"type": "button",
"action_id": "+bc",
"text": {
"type": "plain_text",
"bar": "View results",
"emoji": True,
},
"url": "www.example.com/results",
}
],
},
# problem code chunk above
{
"type": "section",
"block_id": "IBr",
"text": {"type": "mrkdwn", "text": " ", "verbatim": False},
},
],
},
{
"type": "message",
"subtype": "bot_message",
"text": "This content can't be displayed.",
"timestamp": "1650899428.077709",
"username": "admin",
"bot_id": "BPD4K3SJW",
"blocks": [
{
"type": "section",
"block_id": "Smd",
"text": {
"type": "mrkdwn",
"text": "You have a new message.",
"verbatim": False,
},
},
{
"type": "section",
"block_id": "6YaLt",
"text": {
"type": "mrkdwn",
"text": "*Heard By*\nOnline Search",
"verbatim": False,
},
},
{
"type": "section",
"block_id": "w3o",
"text": {
"type": "mrkdwn",
"text": "*Which Direction: *\nNorth",
"verbatim": False,
},
},
{
"type": "section",
"block_id": "PTQ",
"text": {
"type": "mrkdwn",
"text": "*Which Destination? *\nMiami",
"verbatim": False,
},
},
{
"type": "section",
"block_id": "JCfSP",
"text": {
"type": "mrkdwn",
"text": "*New Customer? *\nNo",
"verbatim": False,
},
},
# problem code chunk below
{
"type": "actions",
"block_id": "yt4",
"elements": [
{
"type": "button",
"action_id": "+bc",
"text": {
"type": "plain_text",
"bar": "View results",
"emoji": True,
},
"url": "www.example.com/results",
}
],
},
# problem code chunk above
{
"type": "section",
"block_id": "RJOA",
"text": {"type": "mrkdwn", "text": " ", "verbatim": False},
},
],
},
]
My goal is to produce a Pandas dataframe that looks as follows:
heard_by direction destination new_customer
0 Friend North New York Yes
1 Online Search North Miami No
To do so, I use the following:
d_new = (pd.DataFrame([[re.sub(".*[*]\\W+", "", val['text']['text'])
for val in dat['blocks']] for dat in raw_data]).
drop([0, 5], axis = 1))
d_new.columns = ['heard_by', 'direction','destination', 'new_customer']
d_new
Unfortunately, this throws a Key Error:
KeyError: 'text'
However, this code does work, but only if we comment out the following chunks in the list above:
# {'type': 'actions',
# 'block_id': 'yt4',
# 'elements': [{'type': 'button',
# 'action_id': '+bc',
# 'text': {'type': 'plain_text', 'bar': 'View results', 'emoji': True},
# 'url': 'www.example.com/results'}]},
How do we adapt the code to handle this use case?
Thanks!
Solution 1:[1]
Try only keeping the data where "text" is one of the keys:
>>> pd.DataFrame(data=[[re.sub(".*[*]\\W+", "", val['text']['text']) for val in dat['blocks'] if val.get('text')][1:5] for dat in raw_data],
columns=['heard_by', 'direction','destination', 'new_customer'])
heard_by direction destination new_customer
0 Friend North New York Yes
1 Online Search North Miami No
Solution 2:[2]
Since you're not grabbing anything from the "problem chunks", just skip them entirely:
parsed = [[re.sub(".*[*]\\W+", "", val['text']['text']) for val in dat['blocks'] if val["type"] != "actions"] for dat in raw_data]
df_new = pd.DataFrame(parsed).drop([0, 5], axis=1)
d_new.columns = ['heard_by', 'direction','destination', 'new_customer']
Output:
heard_by direction destination new_customer
0 Friend North New York Yes
1 Online Search North Miami No
For what it's worth, when your comprehensions start getting this messy it's best to just write a standard for loop, which is much easier to understand and debug:
parsed = []
for dat in raw_data:
new_row = []
for val in dat["blocks"]:
if val["type"] != "actions":
new_row.append(re.sub(".*[*]\\W+", "", val['text']['text'])
parsed.append(new_row)
As an aside, how and where did you get these data? They're awfully inconsistent in format:
*Heard By*
Friend
*Which Direction? *
North
*Which Destination*
New York
*New Customer:*\Yes # why is there a backslash here? Was it supposed to be '\n'?
*Heard By*
Online Search
*Which Direction: *
North
*Which Destination? *
Miami
*New Customer? *
No
Makes it very difficult to write a more elegant solution.
Solution 3:[3]
The issue seems to be that the problem chunks of code don't have a 'text' key, as their 'text' keys seem to be in the array value for the 'elements' key in those blocks. You may create a function that checks for the existence of the 'elements' or 'text' key and return the correct value accordingly.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | not_speshal |
| Solution 2 | |
| Solution 3 | jh316 |
