'Deleting child's child elements while web scrapping and writing it to a html file using NodeJS puppeteer
I'm doing webscarping and writing the data to another HTML file. On line " const content = await page.$eval('.eApVPN', e => e.innerHTML);" I'm fetching the inner html of a div, this div has multiple p tag, inside those p tags there are multiple hyperlink(a) tags I want to remove those tags href, but I'm unable to do so
const fs = require('fs').promises;
const helps = require('./_helpers');
const OUTDIR = './results/dataset/'
fs.stat(OUTDIR).catch(async (err) => {
if (err.message.includes('Result Director Doesnt Exist')) {
await fs.mkdir(OUTDIR);
}
await fs.mkdir(OUTDIR);
});
const scraperObject = {
async scraper(browser){
const dataSet = await helps.readCSV('./results/dataset.csv');
console.log("dataset is : ", dataset);
var cookies = null
let page = await browser.newPage();
for (let i = 0; i < dataSet.length ; i++) {
let url = dataSet[i].coinPage
const filename = dataSet[i].symbol;
try{
console.log(`Navigating to ${url}...`);
await page.goto(url);
if (cookies == null){
cookies = await page.cookies();
await fs.writeFile('./storage/cookies', JSON.stringify(cookies, null, 2));
}
await helps.autoScroll(page);
await page.waitForSelector('.eApVPN');
const content = await page.$eval('.eApVPN', e => e.innerHTML);
await fs.writeFile(`${OUTDIR}${filename}.html`, content, (error) => { console.log(error); });
console.log("Written to HTML successfully!");
} catch (err){
console.log(err, '------->', dataSet[i].symbol);
}
}
await page.close();
}
}
module.exports = scraperObject;
Solution 1:[1]
Unfortunately Puppeteer doesn't have native functionality to remove nodes. However, you can use .evaluate method to evaluate any javascript script against the current document. For example a script which removes your nodes would look something like this:
await page.evaluate((sel) => {
var elements = document.querySelectorAll(sel);
for(var i=0; i< elements.length; i++){
elements[i].remove()
}
}, ".eApVPN>a")
The above code will remove any <a> nodes directly under a node with eApVPN class. Then you can extract the data with your $eval selector.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Granitosaurus |
