'How to extract elements from a list with specific features

I am trying to web scrape a Wuzzuf website and I want to scrape job skills with this code:

result = requests.get("https://wuzzuf.net/search/jobs/?q=data+analysis&a=navbl")
src = result.content
soup = BeautifulSoup(src, "lxml")
job_skills = soup.find_all("div", {"class": "css-y4udm8"})

But instead it returns all the information from the division. And I want the elements with <a> tags from the same div with class = "css-y4udm8".



Solution 1:[1]

Without more details and expected output it is not that easy to give an explicit answer, so improving your question would be great.

Assuming that the skills are represented by the single list items with dot as prefix, you could use the following selection to extract unique skills:

set(soup.select('div.css-y4udm8 a:-soup-contains("· ")'))

If you like to get also the other <a> in the <div> just go with:

set(soup.select('div.css-y4udm8 a'))

Alternative would be to store text and url in dict:

{a['href']:a.text.replace(' · ','') for a in soup.select('div.css-y4udm8 a')}

Output:

{'/a/Full-Time-Jobs-in-Egypt': 'Full Time', '/a/Experienced-Jobs-in-Egypt': 'Experienced', '/a/Analyst-Research-Jobs-in-Egypt': 'Analyst/Research', '/a/Sales-Jobs-in-Egypt': 'Sales', '/a/Analysis-Jobs-in-Egypt': 'Analysis', '/a/CRM-Jobs-in-Egypt': 'CRM', '/a/Data-Jobs-in-Egypt': 'Data', '/a/Data-Analysis-Jobs-in-Egypt': 'Data Analysis', '/a/Microsoft-Office-Jobs-in-Egypt': 'Microsoft Office', '/a/Sales-Target-Jobs-in-Egypt': 'Sales Target', '/a/Sales-Skills-Jobs-in-Egypt': 'Sales Skills', '/a/IT-Software-Development-Jobs-in-Egypt': 'IT/Software Development', '/a/Engineering-Telecom-Technology-Jobs-in-Egypt': 'Engineering - Telecom/Technology', '/a/Insurance-Jobs-in-Egypt': 'Insurance', '/a/Retail-Jobs-in-Egypt': 'Retail', '/a/Computer-Science-Jobs-in-Egypt': 'Computer Science', '/a/Information-Technology-IT-Jobs-in-Egypt': 'Information Technology (IT)', '/a/Analyze-Jobs-in-Egypt': 'Analyze', '/a/English-Jobs-in-Egypt': 'English', '/a/Data-Analytics-Jobs-in-Egypt': 'Data Analytics', '/a/Manager-Jobs-in-Egypt': 'Manager', '/a/Quality-Jobs-in-Egypt': 'Quality', '/a/SQL-Jobs-in-Egypt': 'SQL', '/a/ETL-Jobs-in-Egypt': 'ETL', '/a/Microsoft-SQL-Server-Jobs-in-Egypt': 'Microsoft SQL Server', '/a/Data-Developer-Jobs-in-Egypt': 'Data Developer', '/a/Data-Stage-Jobs-in-Egypt': 'Data Stage', '/a/Accounting-Finance-Jobs-in-Egypt': 'Accounting/Finance', '/a/Analyst-Jobs-in-Egypt': 'Analyst', '/a/Data-Analyst-Jobs-in-Egypt': 'Data Analyst', '/a/Relational-Databases-Jobs-in-Egypt': 'Relational Databases', '/a/Software-Jobs-in-Egypt': 'Software', '/a/Entry-Level-Jobs-in-Egypt': 'Entry Level', '/a/Customer-Service-Jobs-in-Egypt': 'Customer Service', '/a/Marketing-PR-Advertising-Jobs-in-Egypt': 'Marketing/PR/Advertising', '/a/Microsoft-PowerPoint-Jobs-in-Egypt': 'Microsoft PowerPoint', '/a/Microsoft-Outlook-Jobs-in-Egypt': 'Microsoft Outlook', '/a/Microsoft-Word-Jobs-in-Egypt': 'Microsoft Word', '/a/After-Sales-Jobs-in-Egypt': 'After Sales', '/a/Electronics-Jobs-in-Egypt': 'Electronics', '/a/Software-Development-Jobs-in-Egypt': 'Software Development', '/a/Python-Jobs-in-Egypt': 'Python', '/a/Big-Data-Jobs-in-Egypt': 'Big Data', '/a/Logistics-Supply-Chain-Jobs-in-Egypt': 'Logistics/Supply Chain', '/a/Power-BI-Jobs-in-Egypt': 'Power BI', '/a/Tableau-Jobs-in-Egypt': 'Tableau', '/a/Economics-Jobs-in-Egypt': 'Economics', '/a/Business-Jobs-in-Egypt': 'business', '/a/Statistics-Jobs-in-Egypt': 'Statistics', '/a/Reporting-Jobs-in-Egypt': 'Reporting', '/a/Database-Jobs-in-Egypt': 'Database', '/a/Analytical-Jobs-in-Egypt': 'analytical', '/a/Communication-Jobs-in-Egypt': 'Communication', '/a/Communication-Skills-Jobs-in-Egypt': 'Communication skills', '/a/Engineering-Jobs-in-Egypt': 'Engineering', '/a/Programming-Jobs-in-Egypt': 'Programming', '/a/Engineering-Mechanical-Electrical-Jobs-in-Egypt': 'Engineering - Mechanical/Electrical', '/a/Electrical-Jobs-in-Egypt': 'Electrical', '/a/Computer-Engineering-Jobs-in-Egypt': 'Computer Engineering', '/a/Microsoft-Excel-Jobs-in-Egypt': 'Microsoft Excel', '/a/Data-Entry-Jobs-in-Egypt': 'Data Entry', '/a/Microsoft-Power-BI-Jobs-in-Egypt': 'Microsoft Power BI', '/a/Business-Analysis-Jobs-in-Egypt': 'Business Analysis', '/a/BI-Jobs-in-Egypt': 'BI', '/a/Computer-Skills-Jobs-in-Egypt': 'Computer Skills', '/a/Administration-Jobs-in-Egypt': 'Administration', '/a/Big-Data-Analytics-Jobs-in-Egypt': 'Big Data Analytics'}

Example

from bs4 import BeautifulSoup
import requests

result = requests.get("https://wuzzuf.net/search/jobs/?q=data+analysis&a=navbl")
src = result.content
soup = BeautifulSoup(src)
skills = set(a.text.replace(' · ','') for a in soup.select('div.css-y4udm8 a:-soup-contains("· ")'))
skills

Output

{'analytical', 'English', 'Statistics', 'Engineering', 'Information Technology (IT)', 'Reporting', 'Accounting/Finance', 'Computer Skills', 'CRM', 'SQL', 'Microsoft SQL Server', 'business', 'Sales', 'Microsoft Office', 'Tableau', 'Marketing/PR/Advertising', 'Analyst', 'Software', 'Microsoft Excel', 'BI', 'Analyst/Research', 'Engineering - Mechanical/Electrical', 'Electrical', 'IT/Software Development', 'Retail', 'Computer Science', 'Communication', 'Engineering - Telecom/Technology', 'Sales Target', 'Sales/Retail', 'Analyze', 'Logistics/Supply Chain', 'Microsoft PowerPoint', 'Communication skills', 'After Sales', 'Customer Service', 'Electronics', 'Economics', 'Customer Service/Support', 'Microsoft Word', 'Microsoft Outlook', 'Python', 'Sales Skills', 'Quality', 'Programming', 'Microsoft Power BI', 'Administration', 'Computer Engineering', 'Power BI', 'ETL', 'Software Development', 'Insurance'}

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1