'loop through a list of dictionaries, perform function, and append result to csv

I trying to loop over a list containing Twitter data in a json format. The list is made of several dictionaries each containing data on a politician. The code works if the input json_response only holds data on one politician. However, when json_response is list of dictionaries i get an error.

In short, I believe the issue can be isolated to three for-loops in the code for tweet in json_response['data']:, for dics in json_response['includes']['users']:, and for element in json_response['includes']['media']:.

# Inputs for the request
bearer_token = auth()
headers = create_headers(bearer_token)
keyword = search_query
start_time = "2016-03-01T00:00:00.000Z"
end_time = "2021-03-31T00:00:00.000Z"
max_results = 3000

json_response = [] # empty list that will hold tweet objects
for i in keyword: # loop through list of politicians in keyword i.e. search query and extract tweets
   url = create_url(i, start_time, end_time, max_results)

   json_response.append(connect_to_endpoint(url[0], headers, url[1]))

   pass 

I have only pasted the json_response object for 2 out of 30 politicians due cap on characters. However, the structure is the same for the remaining 28 politicians.

print(json.dumps(json_response, indent=4, sort_keys=True)) # look at json_response object. 
[
    {
        "data": [
            {
                "author_id": "2877379617",
                "created_at": "2021-03-25T12:11:14.000Z",
                "id": "1375057688355336195",
                "text": "@prettynobodyco She blocked me in 2015 - for pointing out that Tim Kaine enables sexual assault in the military and the evidence was his killing of the MJIA and publicly stated that Military commanders should remain in charge of military rape cases. She's Tanden level awful. Congrats!"
            },
            {
                "author_id": "1265018154444562440",
                "created_at": "2021-03-22T19:48:59.000Z",
                "id": "1374085719472361474",
                "text": "@MehcatCat @AlasscanIsBack @PattyArquette @timkaine Funny, they blocked me. \ud83e\udd23\ud83e\udd23"
            },
            {
                "author_id": "2378324935",
                "created_at": "2021-03-07T21:32:13.000Z",
                "id": "1368675879312887810",
                "text": "@DrWinarick @KatieOGrady4 I apologize for any drama. Katie O Grady blocked me because we had a disagreement about Tim Kaine on one of your older posts. I guess I can't please everyone haha. :/"
            },
            {
                "author_id": "821870502943817729",
                "created_at": "2021-02-12T23:53:59.000Z",
                "id": "1360376637385244673",
                "text": "She blocked me a long ass time ago when I asked her why we shoulf care about Tim Kaine's personal view on abortion if it didn't impact legislation"
            },
            {
                "attachments": {
                    "media_keys": [
                        "16_1341045032732770306"
                    ]
                },
                "author_id": "17232340",
                "created_at": "2020-12-21T15:37:07.000Z",
                "id": "1341045038420275205",
                "text": "@DSingh4Biden @moomintroll8 @timkaine @GovernorVA That's why I replied to you. She blocked me previously, for what silliness I can't remember. Tough being a troll AND a snowflake!"
            }
        ],
        "includes": {
            "media": [
                {
                    "media_key": "16_1341045032732770306",
                    "type": "animated_gif"
                }
            ],
            "users": [
                {
                    "created_at": "2014-11-15T02:23:57.000Z",
                    "description": "",
                    "id": "2877379617",
                    "name": "Laura Saylor",
                    "username": "lauraleesaylor"
                },
                {
                    "created_at": "2020-05-25T20:33:36.000Z",
                    "description": "Weird Writer & Lunatic Linguist\nWicked Witch of the East\nshe/her",
                    "id": "1265018154444562440",
                    "name": "Zauberkind",
                    "username": "Zauberkind2"
                },
                {
                    "created_at": "2014-03-08T07:22:31.000Z",
                    "description": "#Resist, #BLM, #Vaxxed, liberal, autistic, kidney transplant survivor, political nerd, mental health advocate, fighter for equality, truth, justice, etc.",
                    "id": "2378324935",
                    "name": "Trevor \"Trev\" McKee Achilles",
                    "username": "MrTAchilles"
                },
                {
                    "created_at": "2017-01-19T00:02:52.000Z",
                    "description": "statist /  Progressive Gun Nut/ Single and hating it\n\n / \n\nstraight????? /\n\npronouns / brain worm survivor\n\n   \n",
                    "id": "821870502943817729",
                    "name": "Squirrel Dad",
                    "username": "nihilisticpillo"
                },
                {
                    "created_at": "2008-11-07T15:09:46.000Z",
                    "description": "Liberal-Veteran-Dog Lover | Taste for irony, but in moderation | Humor is reason gone mad. ~Groucho Marx | I follow & unfollow back #VeteransResist #Resist",
                    "id": "17232340",
                    "name": "anti-Fascist Jim",
                    "username": "JimnBL"
                }
            ]
        },
        "meta": {
            "newest_id": "1375057688355336195",
            "next_token": "b26v89c19zqg8o3foseug43lzoqdft4ghg78o9sn9ds3h",
            "oldest_id": "1341045038420275205",
            "result_count": 5
        }
    },
    {
        "data": [
            {
                "author_id": "1248251899884814336",
                "created_at": "2021-03-27T13:36:45.000Z",
                "id": "1375803982409576450",
                "text": "@gavinjeffries0 @steven86026859 @MSNBC @SenBooker Uh Oh our friend Steve blocked me, I guess not being able to answer your simple question and being asked to was too much for him."
            },
            {
                "author_id": "293104735",
                "created_at": "2021-02-07T21:45:47.000Z",
                "id": "1358532435122683904",
                "text": "@slwilliams1101 @annabella313 @CrossConnection @TiffanyDCross @Scaramucci @JoyAnnReid @CapehartJ @MSNBC @SenBooker @AliVelshi I stopped watching @TiffanyDCross as well and only watch @CapehartJ now (even though he blocked me in 2016 because I had a \"strong\" response to something mean he said about Hillary Clinton)."
            },
            {
                "author_id": "380970864",
                "created_at": "2021-02-07T20:58:01.000Z",
                "id": "1358520416273326081",
                "text": "@annabella313 @CrossConnection @TiffanyDCross @Scaramucci @JoyAnnReid @CapehartJ @MSNBC After I criticized @TiffanyDCross she blocked me. @JoyAnnReid called herself petty during and interview with @SenBooker.  Why be petty? Be mature and thoughtful so people can learn.  Hosts need to learn too. I only watch @AliVelshi @CapehartJ now."
            },
            {
                "attachments": {
                    "media_keys": [
                        "3_1358448920632909825"
                    ]
                },
                "author_id": "793175035322171397",
                "created_at": "2021-02-07T16:17:44.000Z",
                "id": "1358449876565164034",
                "text": "@FinstaManhattan @SenSchumer @SenBooker @RonWyden Lmao he blocked me over that. His bio said he likes to 'debate & that sometimes he's wrong but he can admit that'.\n\nGuess not.\n\nI wasn't rude or mean at all. This is too funny \ud83e\udd23"
            },
            {
                "author_id": "752266160352010241",
                "created_at": "2021-02-06T20:34:06.000Z",
                "id": "1358152008948195328",
                "text": "@fattypinner @tkbone32221 @SenSchumer @SenBooker @RonWyden He blocked me \ud83e\udd23\ud83d\ude2d\ud83e\udd23\ud83e\udd23\ud83e\udd23\ud83d\ude2d"
            }
        ],
        "includes": {
            "media": [
                {
                    "media_key": "3_1358448920632909825",
                    "type": "photo",
                    "url": ""
                }
            ],
            "users": [
                {
                    "created_at": "2020-04-09T14:11:04.000Z",
                    "description": "",
                    "id": "1248251899884814336",
                    "name": "Firstcomm",
                    "username": "Firstcomm1"
                },
                {
                    "created_at": "2011-05-04T19:26:22.000Z",
                    "description": "Cinephile, balletomane, book lover, tennis fan, K-Drama fanatic, Jang Na-ra fangirl, USC School of Cinematic Arts alumna, Hillary Clinton and Nancy Pelosi Dem.",
                    "id": "293104735",
                    "name": "Joyce Tyler",
                    "username": "joyce_tyler"
                },
                {
                    "created_at": "2011-09-27T14:50:37.000Z",
                    "description": "Spelman College, BA, George Washington University MA, University of South Florida Ph.D. in Political Science, proud Ted Kennedy, Obama, Biden/Harris Democrat!",
                    "id": "380970864",
                    "name": "Stephanie L. Williams, Ph.D.",
                    "username": "slwilliams1101"
                },
                {
                    "created_at": "2016-10-31T19:37:19.000Z",
                    "description": "Loves: life, fam, cats, cars, tattoos, reality TV; collector of t-shirts & Volkswagen\u2019s. Hates: Oxford commas. #CombatVet #Medic #BidenHarris2020 #Resist",
                    "id": "793175035322171397",
                    "name": "Que Sarah Sarah \ud83d\udda4",
                    "username": "sarahalli13"
                },
                {
                    "created_at": "2016-07-10T22:20:03.000Z",
                    "description": "3x Hollywood Video Street Fighter 2 Champion",
                    "id": "752266160352010241",
                    "name": "Sugarcoder",
                    "username": "TheSugarCoder"
                }
            ]
        },
        "meta": {
            "newest_id": "1375803982409576450",
            "next_token": "b26v89c19zqg8o3fosktkdplqiw2q9kzx2ibm4r4y27wd",
            "oldest_id": "1358152008948195328",
            "result_count": 5
        }
    }

...28 other politicians 
# Create file
csvFile = open("tweet_sample.csv", "a", newline="", encoding='utf-8')
csvWriter = csv.writer(csvFile)

# Create headers for the data I want to save. I only want to save these columns in my dataset
csvWriter.writerow(
    ['author id', 'created_at', 'id', 'tweet', 'bio', 'image_url'])
csvFile.close()


def append_to_csv(json_response, fileName):
    # A counter variable
    global created_at, tweet_id, bio, text, author_id
    counter = 0

    # Open OR create the target CSV file
    csvFile = open(fileName, "a", newline="", encoding='utf-8')
    csvWriter = csv.writer(csvFile)

    # Loop through each tweet
    for tweet in json_response[0]['data']: # NOTE adding a 0 gives access to the data for the first politician while adding 1 gives access to data for the second politician and so on...

        # 1. Author ID
        author_id = tweet['author_id']

        # 2. Time created
        created_at = dateutil.parser.parse(tweet['created_at'])

        # 3. Tweet ID
        tweet_id = tweet['id']

        # 4. Tweet text
        text = tweet['text']

    for dics in json_response[0]['includes']['users']: # NOTE 0 added

        # 5. description. Contained in includes data object
        if ('description' in dics):
            bio = dics['description']
        else:
            bio = " "

    for element in json_response[0]['includes']['media']: # NOTE 0 added

        # 6. image url. Contained in includes data object
        if ('url' in element):
            image_url = element['url']
        else:
            image_url = " "

        # Assemble all data in a list
        res = [author_id, created_at, tweet_id, text, bio, image_url]

        # Append the result to the CSV file
        csvWriter.writerow(res)
        counter += 1

    # When done, close the CSV file
    csvFile.close()

    # Print the number of tweets for this iteration
    print("# of Tweets added from this response: ", counter)


append_to_csv(json_response, "tweet_sample.csv")  # Save tweet data in a csv file

Error message: TypeError: list indices must be integers or slices, not str

By adding the [0] in the loop I avoid the TypeError above. However the output from the function append_to_csv is not ideal as it only includes the last tweet for the first politician. I guess my loop overwrites data.

Desired output would be a data frame with columns author_id, created_at, id, tweet, bio, image_url. Not all users have a bio on their profile or an image_url in their tweet hence the if-else statement in the function above and the bio, no_bio and bio, image_url, no_image_url in the desired data frame.

pol_df = pd.read_csv("path_to_tweet_sample.csv" )

pol_df.head()
            author_id                created_at                   id       tweet     bio     image_url
0  737885223858384896  2021-03-26T21:56:02.000Z  1375567243082338314  tweet_text  no_bio  no_image_url
1  847612931487416323  2021-03-26T21:55:24.000Z  1375567083791073283  tweet_text  no_bio  no_image_url
2            18634205  2021-03-08T12:29:00.000Z  1368901564363051010  tweet_text     bio     image_url
3            27327319  2021-03-02T11:53:16.000Z  1366718245521211393  tweet_text     bio  no_image_url
4  917634626247647232  2021-02-28T18:16:45.000Z  1366089974907432961  tweet_text     bio     image_url


Solution 1:[1]

I think you are confusing lists with dicts. When you try to access a list like a dict (e.g. data["author_id"]) the TypeError you're getting will be raised. You have to iterate over a list and then try to access each dict in that list like [x['author_id'] for x in data], for example. If you want to extract values from the dicts and write it to a csv file you might want to do something like this:

import pandas as pd

author_data = []
for data in resp:
    for author in data['data']:
        author_id = author['author_id']
        created_at = author['created_at']
        another_id = author['id']
        tweet_text = author['text']
        author_data.append([author_id, created_at, another_id, tweet_text])

author_df = pd.DataFrame(author_data, columns=['author_id', 'created_at', 'id', 'text'])

media_data = []
for data in resp:
    for media in data['includes']['media']:
        url = media.get('url', 'no_url')
        media_data.append(media)

media_df = pd.DataFrame(media_data, columns=['url'])

bio_data = []
for data in resp:
    for user in data['includes']['users']: 
        bio = user['description']
        author_id = user['id']
        bio_data.append([bio, author_id])
        
bio_df = pd.DataFrame(bio_data, columns=['bio', 'author_id'])

final_df = author_df.merge(bio_df, on="author_id")

print(final_df)

You have to save different parts of the data in different dataframes and then merge them. The thing is that media does not contain the author_id or another key that is shared between the ['includes']['media'] part and ['data'] part so you cannot merge that.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1