'Scraping each link from sitemap.xml

I'm new on Apify.

I would like to scrape each link in the sitemap.xml

More specifically: I have the following situation: My sitemap url: https://www.mywebsite.com/sitemap.xml

My links from sitemap looks like: https://www.mywebsite.com/product_id/product

eg: https://www.mywebsite.com/534372/acer_laptop

I would like to ask you if there is a solution for me to extract from each link the following elements: title, product_image_url, price

I tried Web Scraper and Legacy PhantomJS Crawler, but I think I'm missing something because I can't get the elements I need.

apify

Solution 1:^[1]

For increased performance, either

make sure you disable these options in advanced settings:

Download media files

Download CSS files
look into using cheerio instead of web/puppeteer scraper if you're not yet https://docs.apify.com/scraping/cheerio-scraper
request a custom optimized solution on the MP: https://apify.com/marketplace

Solution 2:^[2]

Consider making a function using Puppeteer. Open the sitemap in your browser and look for the singular tag class name. This function could be a good start. I'm going to try it my self and see if it works

  async function scrap() {
  

      const browser = await puppeteer.launch({
        headless: true,
        args: ["--no-sandbox", "--disable-setuid-sandbox"],
      });

      const page = await browser.newPage();

      await page.goto(`https://yourpage.it/sitemap.xml`);

      const data = await page.evaluate(() => {
       
        const link = document.querySelectorAll(".html-tag > span").innerHTML; //you should be able to loop through it
       
    
        return {
          link
         
        };
      });

      await page.close();
      await browser.close();
      return data;
   
  }

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Vasek Tobey Vlcek
Solution 2	Vincenzo

'Scraping each link from sitemap.xml

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]