'Deleting child's child elements while web scrapping and writing it to a html file using NodeJS puppeteer

I'm doing webscarping and writing the data to another HTML file. On line " const content = await page.$eval('.eApVPN', e => e.innerHTML);" I'm fetching the inner html of a div, this div has multiple p tag, inside those p tags there are multiple hyperlink(a) tags I want to remove those tags href, but I'm unable to do so

const fs = require('fs').promises;
const helps = require('./_helpers');

const OUTDIR = './results/dataset/'

fs.stat(OUTDIR).catch(async (err) => {
if (err.message.includes('Result Director Doesnt Exist')) {
  await fs.mkdir(OUTDIR);
}
await fs.mkdir(OUTDIR);
});

const scraperObject = {
   async scraper(browser){
    const dataSet = await helps.readCSV('./results/dataset.csv');
    console.log("dataset is : ", dataset);
    var cookies = null
    let page = await browser.newPage();
    for (let i = 0; i < dataSet.length ; i++) {
        let url = dataSet[i].coinPage
        const filename = dataSet[i].symbol;
        try{
            console.log(`Navigating to ${url}...`);
            await page.goto(url);
            if (cookies == null){
                cookies = await page.cookies();
                await fs.writeFile('./storage/cookies', JSON.stringify(cookies, null, 2));
            }
           
            await helps.autoScroll(page);
    
            await page.waitForSelector('.eApVPN');
            const content = await page.$eval('.eApVPN', e => e.innerHTML);
            await fs.writeFile(`${OUTDIR}${filename}.html`, content, (error) => { console.log(error); });
            console.log("Written to HTML successfully!");
        } catch (err){
            console.log(err, '------->', dataSet[i].symbol);
        } 
        
    }

    await page.close();
   }
  }

  module.exports = scraperObject;


Solution 1:[1]

Unfortunately Puppeteer doesn't have native functionality to remove nodes. However, you can use .evaluate method to evaluate any javascript script against the current document. For example a script which removes your nodes would look something like this:

await page.evaluate((sel) => {
    var elements = document.querySelectorAll(sel);
    for(var i=0; i< elements.length; i++){
        elements[i].remove()
    }
}, ".eApVPN>a")

The above code will remove any <a> nodes directly under a node with eApVPN class. Then you can extract the data with your $eval selector.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Granitosaurus