'Extracting Specific Text From value from a nested dictionaries with python

I have the following data structure, which I need to extract the word with [ft,mi,FT,MI] of the state key and stored in a new key called distance.

Reproducible Example of my data

[
    {
        "id": 1243,
        "class1": [
            {"count":5,
                "state": "Arizona 4.47ft"
            },
            {
                "state": "Georgia 1023mi"
            }
        ]
    },
    {
        "id": 12438,
        "class1": [
            {"count":2,
                "state": "Newyork 2022 NY 74.6 FT"
            },
            {
                "state": "Indiana 747MI(In)"
            },
            {"count":2,
                "state": "Florida 453mi FL"
            }
        ]
    }
]

A mini exp of the expected output:

[
    {
        "id": 1243,
        "class1": [
            {"count":5,
                "state": "Arizona 4.47ft",
                "distance":"4.47ft"
            },
            {
                "state": "Georgia 1023 mi",
                "distance":"1023 mi"
            }
        ]
    }]

The logic that i have build:

   for a in df['state']:
    for k in a:
        if "state" in k:
            m = ["ft","mi","FT","MI"]
            df['distance']=df['state'].str.extract(r'(\S+\s?(?:%s))\b' % '|'.join(m))

Thank you for your time and have a great day !



Solution 1:[1]

Please see the code which creates "distance" and gets the value from regex pattern " (.*)".

import re

pattern = " (.*)"

for a in data:
    for k in a:
        #print(k)
        if "class" in k:
            #print(a[k])
            for l in a[k]:
                #print(l)
                m = re.findall(pattern, l['state'])
                l['distance'] = m[0]
                #print(l)
    print(a)
   

The output:

enter image description here

You may edit the regex pattern if it does not give proper data.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Baris Ozensel