'string indices must be integers but format is a dictionary, type is a list

I have this file I would like to iterate over as a dictionary, extracting the doc ID, but it does not work, seemingly it is a list...I have tried many things. I do have the file in json format already as well as text, here I have used the text file. I just want to extract the doc ID.

['[{"doc_id":"C00001","LIWC_WC":4051,"LIWC_Analytic":88.89,"LIWC_Clout":76.97,"LIWC_Authentic":13.98,"LIWC_Tone":5.22,"LIWC_WPS":22.26,"LIWC_Sixltr":27.18,"LIWC_Dic":79.26,"LIWC_function":46.63,"LIWC_pronoun":9.78,"LIWC_ppron":4.2,"LIWC_i":0.17,"LIWC_we":0.42,"LIWC_you":0.02,"LIWC_shehe":2.42,"LIWC_they":1.16,"LIWC_ipron":5.58,"LIWC_article":6.96,"LIWC_prep":15.77,"LIWC_auxverb":6.81,"LIWC_adverb":3.33,"LIWC_conj":6.07,"LIWC_negate":0.59,"LIWC_verb":10.54,"LIWC_adj":4.32,"LIWC_compare":2.72,"LIWC_interrog":
for d in data:
    print (d['doc_id'])
    

for d in data:
    for key in d:
        print (d[key])

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-37-57622719c3c4> in <module>
      1 for d in data:
----> 2     print (d['doc_id'])
      3 

TypeError: string indices must be integers



Solution 1:[1]

Other than the unclosed the brackets and curly braces your JSON is valid

import json
dictList = ['[{"doc_id":"C00001","LIWC_WC":4051,"LIWC_Analytic":88.89,"LIWC_Clout":76.97,"LIWC_Authentic":13.98,"LIWC_Tone":5.22,"LIWC_WPS":22.26,"LIWC_Sixltr":27.18,"LIWC_Dic":79.26,"LIWC_function":46.63,"LIWC_pronoun":9.78,"LIWC_ppron":4.2,"LIWC_i":0.17,"LIWC_we":0.42,"LIWC_you":0.02,"LIWC_shehe":2.42,"LIWC_they":1.16,"LIWC_ipron":5.58,"LIWC_article":6.96,"LIWC_prep":15.77,"LIWC_auxverb":6.81,"LIWC_adverb":3.33,"LIWC_conj":6.07,"LIWC_negate":0.59,"LIWC_verb":10.54,"LIWC_adj":4.32,"LIWC_compare":2.72}]']
for data in dictList:
    x = json.loads(data)
for dict in x:
    print(dict['doc_id'])

Output

C00001

To print all the keys in the dictionary

import json
dictList = ['[{"doc_id":"C00001","LIWC_WC":4051,"LIWC_Analytic":88.89,"LIWC_Clout":76.97,"LIWC_Authentic":13.98,"LIWC_Tone":5.22,"LIWC_WPS":22.26,"LIWC_Sixltr":27.18,"LIWC_Dic":79.26,"LIWC_function":46.63,"LIWC_pronoun":9.78,"LIWC_ppron":4.2,"LIWC_i":0.17,"LIWC_we":0.42,"LIWC_you":0.02,"LIWC_shehe":2.42,"LIWC_they":1.16,"LIWC_ipron":5.58,"LIWC_article":6.96,"LIWC_prep":15.77,"LIWC_auxverb":6.81,"LIWC_adverb":3.33,"LIWC_conj":6.07,"LIWC_negate":0.59,"LIWC_verb":10.54,"LIWC_adj":4.32,"LIWC_compare":2.72}]']
for data in dictList:
    x = json.loads(data)
for dict in x:
    for key, value in dict.items():
        print(key)

Solution 2:[2]

SuperStormer is right.

import json
data = json.loads('[{"doc_id":"C00001","LIWC_WC":4051,"LIWC_Analytic":88.89,"LIWC_Clout":76.97,"LIWC_Authentic":13.98,"LIWC_Tone":5.22,"LIWC_WPS":22.26,"LIWC_Sixltr":27.18,"LIWC_Dic":79.26,"LIWC_function":46.63,"LIWC_pronoun":9.78,"LIWC_ppron":4.2,"LIWC_i":0.17,"LIWC_we":0.42,"LIWC_you":0.02,"LIWC_shehe":2.42,"LIWC_they":1.16,"LIWC_ipron":5.58,"LIWC_article":6.96,"LIWC_prep":15.77,"LIWC_auxverb":6.81,"LIWC_adverb":3.33,"LIWC_conj":6.07,"LIWC_negate":0.59,"LIWC_verb":10.54,"LIWC_adj":4.32,"LIWC_compare":2.72}]')

Now data is a list that only contains a dict. You should probably also remove the brackets because this doesn't seem to need to be a list, based on what you copied in from the file. json module is essential to convert these strings into more useful datatypes.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Nathan Wolf