'Is there any way instead of status_code to determine the request is true or false?
I'm using Python3 with BeautifulSoup. I want to scrape data for a few employees from a site, depending on their ID number
.
My code:
for UID in range(201810000,201810020):
ID = UID
print(ID)
#scrapped Data
ZeroDay = s.post("https://site/Add_StudantRow.php",data={"SID":ID})
ZeroDay_content = bs(ZeroDay.content,"html.parser", from_encoding='windows-1256')
std_ID = ZeroDay_content.find("input", {"name":"SID[]"})["value"]
std_name = ZeroDay_content.find("input", {"name":"Name[]"})["value"]
std_major_= ZeroDay_content.select_one("option[selected]", {"name":"Qualifications[]"})["value"]
std_major = ZeroDay_content.find("input", {"name":"Specialization[]"})["value"]
std_social= ZeroDay_content.select_one("select[name='MILITARY_STATUS[]'] option[selected]")["value"]
std_ID_num= ZeroDay_content.find("input", {"name":"ID_Number[]"})["value"]
std_gender= ZeroDay_content.select_one("select[name='Gender[]'] option[selected]")["value"]
print(std_ID,std_name,std_gender,std_major,std_major_,std_ID_num,std_social)
After I ran my code, this error appeared:
std_ID = ZeroDay_content.find("input", {"name":"SID[]"})["value"]
TypeError: 'NoneType' object is not subscriptable
I assigned a range for their ID's from 201810000
to 201810020
but not all the IDs are valid. I mean maybe 201810015
not valid and 201810018
valid.
Note: when I put a valid ID in UID
the error did not appear, possibly because when the ID
returns a null value the error appears, but how can I do a range of ID
s in this case?
Solution 1:[1]
As not all of your UID values return a valid page, you would just need to first test for the presence of a required tag. As you are looking for form elements, I assume there will be an enclosing <form>
tag you could test for first.
For example:
for UID in range(201810000, 201810020):
ID = UID
print(ID)
ZeroDay = s.post("https://site/Add_StudantRow.php", data={"SID":ID})
ZeroDay_content = bs(ZeroDay.content, "html.parser", from_encoding='windows-1256')
if ZeroDay_content.find("form", <xxxxxxx>):
std_ID = ZeroDay_content.find("input", {"name":"SID[]"})["value"]
std_name = ZeroDay_content.find("input", {"name":"Name[]"})["value"]
std_major_= ZeroDay_content.select_one("option[selected]", {"name":"Qualifications[]"})["value"]
std_major = ZeroDay_content.find("input", {"name":"Specialization[]"})["value"]
std_social= ZeroDay_content.select_one("select[name='MILITARY_STATUS[]'] option[selected]")["value"]
std_ID_num= ZeroDay_content.find("input", {"name":"ID_Number[]"})["value"]
std_gender= ZeroDay_content.select_one("select[name='Gender[]'] option[selected]")["value"]
print(std_ID, std_name, std_gender, std_major, std_major_, std_ID_num,s td_social)
Where <xxxxx>
would be suitable attributes to search for.
The error you are getting is because your first .find()
call is returning None
to indicate that the item is not present. You then use ["value"]
on None
which gives the error without first testing if you have found the required item.
Solution 2:[2]
I resolve this by add an IF statement and use content-length as a thing to determine that the request was made or not, because i have noticed that the content-length is less than 170 if the request is return nothing and more 170 if return any thing .
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Martin Evans |
Solution 2 | xxxzman |