'Extract expand url from entities which we get from Twitter user json

How to get a the expanded_url from entities

{"blocked_by": false,
"blocking": false,
"contributors_enabled": false,
"created_at": "Mon Dec 27 16:09:18 +0000 2010",
"default_profile": false,
"default_profile_image": false,
"description": "Wealth Management. Banker. Former elite cyclist. 100% galego. Yield hunter. Until debt tear us apart | USC Alumni",
"entities": {
  "description": {
    "urls": []
  },
  "url": {
    "urls": [
      {
        "display_url": "fanecabrava.substack.com/?utm_source=ac\u2026",
        "expanded_url": "https://fanecabrava.substack.com/?utm_source=account-card&utm_content=writes",
        "indices": [
          0,
          23
        ],
        "url": "shorteners urls i deleted it to post this"
      }
    ]
  }
},
"favourites_count": 3808,
"follow_request_sent": false,
"followers_count": 578,
"following": false,
"friends_count": 465,
"geo_enabled": true,
"has_extended_profile": false,
"id": 231102009,
"id_str": "231102009",
"is_translation_enabled": false,
"is_translator": false,
"lang": null,

}

This link has the full json file

https://drive.google.com/file/d/1zB1mmU5zHbJC6R7ZZESReB0bWGQVTvfx/view?usp=drivesdk

I want only the expanded url for store it into a csv file



Solution 1:[1]

You could use a combination of read_json() and apply().

import pandas as pd
import requests
data = pd.read_json('json.json')

print(data)
# alternative 1: from description section
urls = data['entities'].apply(lambda x: (x['description']['urls'
                              ][0]['expanded_url'
                              ] if len(x['description']['urls'])
                              > 0 else pd.NA))
urls = urls[urls.notna()]


# alternative 2: from url section
urls = data['url']

# expand urls
def expand_url(url):
    if url is None:
      return ''

    r = requests.get(url, allow_redirects=False)
    try:
        return r.headers['location']
    except KeyError:
        return ''
expanded_url = urls.apply(expand_url)

Output:

0                                                       
1                          https://www.sallyturbitt.com/
2                         https://johannesdrooghaag.com/
3                                                       
4                                                       

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1