'How do use the soup.find, soup.find_all
Here is my code and the output
import requests from bs4 import BeautifulSoup
res = requests.get("https://www.jobberman.com/jobs")
soup = BeautifulSoup(res.text, "html.parser")
job = soup.find("div", class_ = "relative inline-flex flex-col w-full text-sm font-normal pt-2")
company_name = job.find('a[href*="jobs"]')
print(company_name)
output is none
None
But when i use the select method, i got the desired result but cant use .text on it
import requests
from bs4 import BeautifulSoup
res = requests.get("https://www.jobberman.com/jobs")
soup = BeautifulSoup(res.text, "html.parser")
job = soup.find("div", class_ = "relative inline-flex flex-col w-full text-sm font-normal pt-2")
company_name = job.select('a[href*="jobs"]').text
print(company_name)
output
AttributeError: ResultSet object has no attribute 'text'. You're probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?
Solution 1:[1]
Change your selection strategy - Cause main issue here is, that not all company names are linked:
job.find('div',{'class':'search-result__job-meta'}).text.strip()
or
job.select_one('.search-result__job-meta').text.strip()
Example
Also store your information in a structured way for post processing:
import requests
from bs4 import BeautifulSoup
res = requests.get("https://www.jobberman.com/jobs")
soup = BeautifulSoup(res.text, "html.parser")
data = []
for job in soup.select('div:has(>.search-result__body)'):
data.append({
'job':job.h3.text,
'company':job.select_one('.search-result__job-meta').text.strip()
})
data
Output
[{'job': 'Restaurant Manager', 'company': 'Balkaan Employments service'},
{'job': 'Executive Assistant', 'company': 'Nolla Fresh & Frozen ltd'},
{'job': 'Portfolio Manager/Instructor 1', 'company': 'Fun Science World'},
{'job': 'Microbiologist', 'company': "NEIMETH INT'L PHARMACEUTICALS PLC"},
{'job': 'Data Entry Officer', 'company': 'Nkoyo Pharmaceuticals Ltd.'},
{'job': 'Chemical Analyst', 'company': "NEIMETH INT'L PHARMACEUTICALS PLC"},
{'job': 'Senior Front-End Engineer', 'company': 'Salvo Agency'},...]
Solution 2:[2]
The problems with your search strategy has been covered by comments and answers posted earlier. I am offering a solution for your problem which involves the use of regex library, along with the find_all() function call:
import requests
from bs4 import BeautifulSoup
import re
res = requests.get("https://www.jobberman.com/jobs")
soup = BeautifulSoup(res.text, "html.parser")
company_name = soup.find_all("a", href=re.compile("/jobs\?"), rel="nofollow")
for i in range(len(company_name)):
print(company_name[i].text)
Output:
GRATIAS DEI NIGERIA LIMITED
Balkaan Employments service
Fun Science World
NEIMETH INT'L PHARMACEUTICALS PLC
Nkoyo Pharmaceuticals Ltd.
...
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | Cheo Kee Jin |
