'Beautiful Soup Nested Tag Search returns None after 10th search

I am trying to write a python script using Beautiful Soup that will scrape the name and the symbol of each cryptocurrency. Despite there being over hundreds of symbols, after the 10th iteration, None gets returned. Could anyone help me out? The website I am trying to scrap is https://coinmarketcap.com

The Code:

from bs4 import BeautifulSoup
import requests
import csv

source=requests.get('https://coinmarketcap.com').text

soup = BeautifulSoup(source, 'html.parser')

def scrape_data():
    container = soup.find('tbody')
    theData = container.find_all("tr")
    for i in theData:
        individual_symbol= i.find('p', attrs= {"class":"sc-1eb5slv-0 gGIpIK coin-item-symbol"})
        individual_name = i.find('p', attrs= {"class":"sc-1eb5slv-0 iworPT"})
        print('Name: {}, Symbol: {}'.format(individual_name.text, individual_symbol.text))

scrape_data()

This gets returned

Name: Bitcoin, Symbol: BTC
Name: Ethereum, Symbol: ETH
Name: Tether, Symbol: USDT
Name: BNB, Symbol: BNB
Name: USD Coin, Symbol: USDC
Name: XRP, Symbol: XRP
Name: Terra, Symbol: LUNA
Name: Cardano, Symbol: ADA
Name: Solana, Symbol: SOL
Name: Avalanche, Symbol: AVAX
Traceback (most recent call last):
  File "/Users/ryan/Documents/PythonProjects/EODWebScrape/main.py", line 18, in <module>
    scrape_data()
  File "/Users/ryan/Documents/PythonProjects/EODWebScrape/main.py", line 15, in scrape_data
    print(individual_symbol.text)
AttributeError: 'NoneType' object has no attribute 'text'
ryan@Ryans-MBP PythonProjects % 


Solution 1:[1]

The data is present within the <script> tags in json format. I'm always of the mindset of get the full data, then can always filter out what you need. This will get the full data available:

Code:

import pandas as pd
import requests
from bs4 import BeautifulSoup
import json
import re

headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}

dfs = []
for page in range(1,21):
    print(f'Page: {page}')
    url = f'https://coinmarketcap.com/?page={page}'
    response = requests.get(url, headers=headers)
    
    soup = BeautifulSoup(response.text, 'html.parser')
    script = soup.find_all('script')[-1]
    
    jsonStr = re.search('({.*})', str(script)).group(1)
    jsonData = json.loads(jsonStr)
    
    colsData = jsonData['props']['initialState']['cryptocurrency']['listingLatest']['data'][0]
    cols = colsData['keysArr'] + colsData['excludeProps']
    data = jsonData['props']['initialState']['cryptocurrency']['listingLatest']['data'][1:]
    
    df = pd.DataFrame(data, columns=cols)
    dfs.append(df)
    
df = pd.concat(dfs, axis=0)


name_symbol = df[['name','symbol']]

Full data :

print(df)
             ath        atl  ...  quotes.1.tvl  quotes.2.tvl
0   68789.625939  65.526001  ...           NaN           NaN
1    4891.704698   0.420897  ...           NaN           NaN
2       1.215490   0.568314  ...           NaN           NaN
3     690.931965   0.096109  ...           NaN           NaN
4       2.349556   0.929222  ...           NaN           NaN
..           ...        ...  ...           ...           ...
95      0.054136   0.000109  ...           NaN           NaN
96   1516.640112   0.000000  ...           NaN           NaN
97      0.066469   0.000600  ...           NaN           NaN
98      0.750742   0.000201  ...           NaN           NaN
99      0.015614   0.000111  ...           NaN           NaN

[2000 rows x 153 columns]

Name/Symbol:

print(name_symbol)
                 name symbol
0             Bitcoin    BTC
1            Ethereum    ETH
2              Tether   USDT
3                 BNB    BNB
4            USD Coin   USDC
..                ...    ...
95              HYCON    HYC
96  Pepemon Pepeballs  PPBLZ
97           IONChain   IONC
98          DecentBet   DBET
99          BlitzPick    XBP

[2000 rows x 2 columns]

Solution 2:[2]

Ok I checked this page and it seems only first 10 are loaded without JavaScript. look at image enter image description here

If you use requests have in mind this only works for static data - not loaded with JS. So if anything is working not as expected check page without JS enabled.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 chitown88
Solution 2 Dharman