'python scraping print also show []
I want to get this page's timetable but my code the end print is [].
How can I get the time table use Beautiful soup?
This is my page https://www.thsrc.com.tw/ArticleContent/a3b630bb-1066-4352-a1ef-58c7b4e8ef7c
The end print also show[]
What tags or class should I use?
from selenium import webdriver
from selenium.webdriver.support.ui import Select
import pandas as pd
import time
import requests as req
from bs4 import BeautifulSoup
url="https://www.thsrc.com.tw/ArticleContent/a3b630bb-1066-4352-a1ef-58c7b4e8ef7c"
driver=webdriver.Chrome()
driver.get(url)
ag=driver.find_element_by_class_name("swal2-confirm.swal2-styled")
ag.click()
#start
def start():
print("請輸入起始站")
start=Select(driver.find_element_by_id("select_location01"))
all_option=start.select_by_visible_text(input(""))
start()
#terminal
def stand():
print("terminal")
stand=Select(driver.find_element_by_id("select_location02"))
all_option2=stand.select_by_visible_text(input(""))
stand()
#search
sr=driver.find_element_by_id("start-search")
sr.click()
soup = BeautifulSoup(driver.page_source, "lxml")
dan=[]
data=soup.find_all("div","timeTableTrain_S")
dan.append(data)
print(dan)
time.sleep(5)
driver.quit()
Solution 1:[1]
Looking at the site it's better to use their API (the data is loaded from this API using JavaScript). This example will print all train/station info, departure and destination tables:
import json
import requests
api_url = "https://www.thsrc.com.tw/TimeTable/Search"
payload = {
"SearchType": "S",
"Lang": "TW",
"StartStation": "NanGang",
"EndStation": "ZuoYing",
"OutWardSearchDate": "2022/05/20",
"OutWardSearchTime": "10:30",
"ReturnSearchDate": "2022/05/20",
"ReturnSearchTime": "10:30",
"DiscountType": "",
}
data = requests.post(api_url, data=payload).json()
print(json.dumps(data, indent=4))
Prints:
{
"success": true,
"data": {
"DepartureTable": {
"Title": {
"StartStationName": "\u5357\u6e2f",
"EndStationName": "\u5de6\u71df",
"TitleSplit1": "2022/05/20(\u4e94) 10:30",
"TitleSplit2": "2022/05/20(\u4e94)"
},
"TrainItem": [
{
"TrainNumber": "0803",
"DepartureTime": "06:15",
"DestinationTime": "08:40",
"Duration": "02:25",
"NonReservedCar": "10-12",
"Discount": [
{
"Id": "9973b559-8279-4bf4-90be-601f7973a39f",
"Name": "20\u4eba\u5718\u9ad4",
"Value": "8\u6298",
"Color": "#1685e4",
"Discount": "80"
},
{
"Id": "68d9fc7b-7330-44c2-962a-74bc47d2ee8a",
"Name": "\u5927\u5b78\u751f",
"Value": "5\u6298",
"Color": "#e45916",
"Discount": "5"
},
{
"Id": "40863ff1-a16c-4da1-8af7-c1f8991627f3",
"Name": "\u6821\u5916\u6559\u5b78",
"Value": "4/7\u6298",
"Color": "#ffcc00",
"Discount": "40|70"
}
],
"Note": "",
"Sequence": 0,
"StationInfo": [
{
"StationNo": "01",
"StationName": "\u5357\u6e2f",
"DepartureTime": "06:15",
"Show": true,
"ColorClass": "orange"
},
... and so on.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Andrej Kesely |
