'What is the fix for this Error: 'NoneType' object has no attribute 'prettify'

I want to scrape this URL https://aviation-safety.net/wikibase/type/C206.

I don't understand the meaning of this error below: 'NoneType' object has no attribute 'prettify'

import requests
import pandas as pd
from bs4 import BeautifulSoup
from urllib.request import Request

url = 'https://aviation-safety.net/wikibase/type/C206'
req = Request(url , headers = {
                          'accept':'*/*',
                          'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36'})

data = []

while True:
    print(url)
    html = requests.get(url)
    soup = BeautifulSoup(html.text, 'html.parser')
    data.append(pd.read_html(soup.select_one('tbody').prettify())[0])

    if soup.select_one('div.pagenumbers + div a[href]'):
        url = soup.select_one('div.pagenumbers + div a')['href']
    else:
        break
df = pd.concat(data)
df.to_csv('206.csv',encoding='utf-8-sig',index=False)


Solution 1:[1]

You're not using headers with requests, which is the reason you're not getting the right HTML and the table you're after is the second one, not the first. Also, I'd highly recommend to use requests over urllib.request.

So, having said that, here's how to get all the tables from all the pages:

import pandas as pd
import requests
from bs4 import BeautifulSoup

url = 'https://aviation-safety.net/wikibase/type/C206'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.190 Safari/537.36',
}

data = []
with requests.Session() as s:
    total_pages = int(
        BeautifulSoup(s.get(url, headers=headers).text, "lxml")
        .select("div.pagenumbers > a")[-1]
        .getText()
    )

    for page in range(1, total_pages + 1):
        print(f"Getting page: {page}...")
        data.append(
            pd.read_html(
                s.get(f"{url}/{page}", headers=headers).text,
                flavor="lxml",
            )[1]
        )

df = pd.concat(data)
df.to_csv('206.csv', sep=";", index=False)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1