'Key error when accessing a nested dictionary

I have the following list of nested dictionaries:

raw_data = [
    {
        "type": "message",
        "subtype": "bot_message",
        "text": "This content can't be displayed.",
        "timestamp": "1650905606.755969",
        "username": "admin",
        "bot_id": "BPD4K3SJW",
        "blocks": [
            {
                "type": "section",
                "block_id": "BJNTn",
                "text": {
                    "type": "mrkdwn",
                    "text": "You have a new message.",
                    "verbatim": False,
                },
            },
            {
                "type": "section",
                "block_id": "WPn/l",
                "text": {
                    "type": "mrkdwn",
                    "text": "*Heard By*\nFriend",
                    "verbatim": False,
                },
            },
            {
                "type": "section",
                "block_id": "5yp",
                "text": {
                    "type": "mrkdwn",
                    "text": "*Which Direction? *\nNorth",
                    "verbatim": False,
                },
            },
            {
                "type": "section",
                "block_id": "fKEpF",
                "text": {
                    "type": "mrkdwn",
                    "text": "*Which Destination*\nNew York",
                    "verbatim": False,
                },
            },
            {
                "type": "section",
                "block_id": "qjAH",
                "text": {
                    "type": "mrkdwn",
                    "text": "*New Customer:*\Yes",
                    "verbatim": False,
                },
            },
            # problem code chunk below
            {
                "type": "actions",
                "block_id": "yt4",
                "elements": [
                    {
                        "type": "button",
                        "action_id": "+bc",
                        "text": {
                            "type": "plain_text",
                            "bar": "View results",
                            "emoji": True,
                        },
                        "url": "www.example.com/results",
                    }
                ],
            },
            # problem code chunk above
            {
                "type": "section",
                "block_id": "IBr",
                "text": {"type": "mrkdwn", "text": " ", "verbatim": False},
            },
        ],
    },
    {
        "type": "message",
        "subtype": "bot_message",
        "text": "This content can't be displayed.",
        "timestamp": "1650899428.077709",
        "username": "admin",
        "bot_id": "BPD4K3SJW",
        "blocks": [
            {
                "type": "section",
                "block_id": "Smd",
                "text": {
                    "type": "mrkdwn",
                    "text": "You have a new message.",
                    "verbatim": False,
                },
            },
            {
                "type": "section",
                "block_id": "6YaLt",
                "text": {
                    "type": "mrkdwn",
                    "text": "*Heard By*\nOnline Search",
                    "verbatim": False,
                },
            },
            {
                "type": "section",
                "block_id": "w3o",
                "text": {
                    "type": "mrkdwn",
                    "text": "*Which Direction: *\nNorth",
                    "verbatim": False,
                },
            },
            {
                "type": "section",
                "block_id": "PTQ",
                "text": {
                    "type": "mrkdwn",
                    "text": "*Which Destination? *\nMiami",
                    "verbatim": False,
                },
            },
            {
                "type": "section",
                "block_id": "JCfSP",
                "text": {
                    "type": "mrkdwn",
                    "text": "*New Customer? *\nNo",
                    "verbatim": False,
                },
            },
            # problem code chunk below
            {
                "type": "actions",
                "block_id": "yt4",
                "elements": [
                    {
                        "type": "button",
                        "action_id": "+bc",
                        "text": {
                            "type": "plain_text",
                            "bar": "View results",
                            "emoji": True,
                        },
                        "url": "www.example.com/results",
                    }
                ],
            },
            # problem code chunk above
            {
                "type": "section",
                "block_id": "RJOA",
                "text": {"type": "mrkdwn", "text": " ", "verbatim": False},
            },
        ],
    },
]

My goal is to produce a Pandas dataframe that looks as follows:

    heard_by         direction   destination       new_customer
0   Friend           North       New York          Yes
1   Online Search    North       Miami             No

To do so, I use the following:

d_new = (pd.DataFrame([[re.sub(".*[*]\\W+", "", val['text']['text']) 
               for val in dat['blocks']] for dat in raw_data]).
          drop([0, 5], axis = 1))

d_new.columns = ['heard_by', 'direction','destination', 'new_customer']

d_new

Unfortunately, this throws a Key Error:

KeyError: 'text'

However, this code does work, but only if we comment out the following chunks in the list above:

#    {'type': 'actions',
#    'block_id': 'yt4',
#    'elements': [{'type': 'button',
#      'action_id': '+bc',
#      'text': {'type': 'plain_text', 'bar': 'View results', 'emoji': True},
#      'url': 'www.example.com/results'}]},

How do we adapt the code to handle this use case?

Thanks!



Solution 1:[1]

Try only keeping the data where "text" is one of the keys:

>>> pd.DataFrame(data=[[re.sub(".*[*]\\W+", "", val['text']['text']) for val in dat['blocks'] if val.get('text')][1:5] for dat in raw_data],
                 columns=['heard_by', 'direction','destination', 'new_customer'])

        heard_by direction destination new_customer
0         Friend     North    New York          Yes
1  Online Search     North       Miami           No

Solution 2:[2]

Since you're not grabbing anything from the "problem chunks", just skip them entirely:

parsed = [[re.sub(".*[*]\\W+", "", val['text']['text']) for val in dat['blocks'] if val["type"] != "actions"] for dat in raw_data]

df_new = pd.DataFrame(parsed).drop([0, 5], axis=1)
d_new.columns = ['heard_by', 'direction','destination', 'new_customer']

Output:

        heard_by direction destination new_customer
0         Friend     North    New York          Yes
1  Online Search     North       Miami           No

For what it's worth, when your comprehensions start getting this messy it's best to just write a standard for loop, which is much easier to understand and debug:

parsed = []
for dat in raw_data:
    new_row = []
    for val in dat["blocks"]:
        if val["type"] != "actions":
            new_row.append(re.sub(".*[*]\\W+", "", val['text']['text'])
    parsed.append(new_row)

As an aside, how and where did you get these data? They're awfully inconsistent in format:

*Heard By*
Friend
*Which Direction? *
North
*Which Destination*
New York
*New Customer:*\Yes  # why is there a backslash here? Was it supposed to be '\n'?

*Heard By*
Online Search
*Which Direction: *
North
*Which Destination? *
Miami
*New Customer? *
No

Makes it very difficult to write a more elegant solution.

Solution 3:[3]

The issue seems to be that the problem chunks of code don't have a 'text' key, as their 'text' keys seem to be in the array value for the 'elements' key in those blocks. You may create a function that checks for the existence of the 'elements' or 'text' key and return the correct value accordingly.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 not_speshal
Solution 2
Solution 3 jh316