'Extracting Specific Text From value from a nested dictionaries with python
I have the following data structure, which I need to extract the word with [ft,mi,FT,MI] of the state key and stored in a new key called distance.
Reproducible Example of my data
[
{
"id": 1243,
"class1": [
{"count":5,
"state": "Arizona 4.47ft"
},
{
"state": "Georgia 1023mi"
}
]
},
{
"id": 12438,
"class1": [
{"count":2,
"state": "Newyork 2022 NY 74.6 FT"
},
{
"state": "Indiana 747MI(In)"
},
{"count":2,
"state": "Florida 453mi FL"
}
]
}
]
A mini exp of the expected output:
[
{
"id": 1243,
"class1": [
{"count":5,
"state": "Arizona 4.47ft",
"distance":"4.47ft"
},
{
"state": "Georgia 1023 mi",
"distance":"1023 mi"
}
]
}]
The logic that i have build:
for a in df['state']:
for k in a:
if "state" in k:
m = ["ft","mi","FT","MI"]
df['distance']=df['state'].str.extract(r'(\S+\s?(?:%s))\b' % '|'.join(m))
Thank you for your time and have a great day !
Solution 1:[1]
Please see the code which creates "distance" and gets the value from regex pattern " (.*)".
import re
pattern = " (.*)"
for a in data:
for k in a:
#print(k)
if "class" in k:
#print(a[k])
for l in a[k]:
#print(l)
m = re.findall(pattern, l['state'])
l['distance'] = m[0]
#print(l)
print(a)
The output:
You may edit the regex pattern if it does not give proper data.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Baris Ozensel |

