'Parse a file which is a lists of objects in Python [closed]
I have a json-like file in the below format, I would like to store the BLEU score attribute in a list and the chrF2++ score in another list.
The file format:
[
{
"name": "BLEU",
"score": 38.8,
"signature": "nrefs:1|case:lc|eff:no|tok:13a|smooth:exp|version:2.0.0",
"verbose_score": "75.0/45.5/30.0/22.2 (BP = 1.000 ratio = 1.000 hyp_len = 12 ref_len = 12)",
"nrefs": "1",
"case": "lc",
"eff": "no",
"tok": "13a",
"smooth": "exp",
"version": "2.0.0"
},
{
"name": "chrF2++",
"score": 49.6,
"signature": "nrefs:1|case:mixed|eff:yes|nc:6|nw:2|space:no|version:2.0.0",
"nrefs": "1",
"case": "mixed",
"eff": "yes",
"nc": "6",
"nw": "2",
"space": "no",
"version": "2.0.0"
}
]
[
{
"name": "BLEU",
"score": 19.2,
"signature": "nrefs:1|case:lc|eff:no|tok:13a|smooth:exp|version:2.0.0",
"verbose_score": "61.5/33.3/18.2/5.0 (BP = 0.926 ratio = 0.929 hyp_len = 13 ref_len = 14)",
"nrefs": "1",
"case": "lc",
"eff": "no",
"tok": "13a",
"smooth": "exp",
"version": "2.0.0"
},
{
"name": "chrF2++",
"score": 38.8,
"signature": "nrefs:1|case:mixed|eff:yes|nc:6|nw:2|space:no|version:2.0.0",
"nrefs": "1",
"case": "mixed",
"eff": "yes",
"nc": "6",
"nw": "2",
"space": "no",
"version": "2.0.0"
}
]
....
I tried:
with open(sys.argv[1]) as f:
for jsonObj in f:
list_of_scores = json.loads(jsonObj)
print(list_of_scores)
bleuScores.append(list_of_scores[0])
chrfScores.append(list_of_scores[1])
but it did not work
Solution 1:[1]
Your data format is almost JSON, except that it appears you're getting multiple lists in a single file, without structure around them:
Your format, abbreviated:
[
{"some": "dict"}
]
[
{"some": "dict"}
]
Valid JSON:
[
[
{"some": "dict"}
],
[
{"some": "dict"}
]
]
So, an approach would be to add square brackets around the full content and replace any occurrence of a closing square bracket followed by nothing but whitespace (including newlines) and another opening square bracket by ],[
Of course a limitation of this approach is that a value like "oh ] [ no" would also be modified, so excluding anything in double quotes might be an added requirement, but that goes beyond the scope of your question.
A solution might look like:
import re
import json
def fix_content(s):
s = re.sub(r']\s\[', '],\n[', s)
return f'[{s}]'
with open('mess.json') as f:
data = json.loads(fix_content(f.read()))
for some_list in data:
for d in some_list:
print(d)
Getting those 2 lists of scores:
BLEUs, chrF2s = zip(*((d['BLEU'], d['chrF2++'])
for d in (dict((d['name'], d['score'])
for d in part) for part in data)))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
