'Cannot Web Scraping Tripadvisor

I try to web scraping the "things to do" on Tripadvisor(for example, the link is https://www.tripadvisor.com/Attractions-g30196-Activities-c57-Austin_Texas.html). But I stuck in the first few code. I waited for more than 10 minutes without response. I tried the code and link three days ago, it worked. But now, it generated nothing. The code is:

import requests
trip = 'https://www.tripadvisor.com/Tourism-g30196-Austin_Texas-Vacations.html'
response = requests.get(trip)
print(type(response))

my code and result

I don't know what is going on here. Looking forward to your help! Thanks a lot.



Solution 1:[1]

First you should try to set header User-Agent from real web browser (at start you can try shorter Mozilla/5.0) because requests sends something like python/3.8 requests/2.x and server can recognize script and block it. Some server needs this also to send different content for different browsers or devices (desktop, tablet, phone).

import requests
from bs4 import BeautifulSoup

#url = 'https://www.tripadvisor.com/Tourism-g30196-Austin_Texas-Vacations.html'

url = 'https://www.tripadvisor.com/Attractions-g30196-Activities-c57-Austin_Texas.html'

response = requests.get(url, headers={'User-Agent': "Mozilla/5.0"})

soup = BeautifulSoup(response.text, 'html.parser')

items = soup.find_all('span', {'name': 'title'})

for i in items:
    print(i.text)

Result:

1. Lady Bird Lake Hike-and-Bike Trail
2. Barton Springs Pool
3. Mount Bonnell
4. Congress Avenue Bridge / Austin Bats
5. Lady Bird Johnson Wildflower Center
6. Austin Aquarium
7. Zilker Metropolitan Park
8. McKinney Falls State Park
9. Barton Creek Greenbelt
10. Austin Zoo
11. Mayfield Park
12. Zilker Botanical Garden
13. Town Lake
14. Westcave Outdoor Discovery Center
15. Bull Creek District Park
16. Austin Nature & Science Center
17. Turkey Creek Trail
18. River Place Nature Trails
19. Mueller Lake Park
20. Zilker Playground
21. Deep Eddy Pool
22. Red Bud Isle Park
23. Mansfield Dam Park
24. Pease Park
25. Wild Basin Preserve
26. Emma Long Metropolitan Park
27. Shoal Creek Greenbelt
28. Commons Ford Ranch
29. Hornsby Bend Bird Observatory
30. Mary Moore Searight Metropolitan Park

EDIT:

In my GitHub python-examples / scraping you can find code from other answers on Stackoverflow which scrapes tripadvisor using selenium and scrapy.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1