'How to extract textContent from a list of classes in Puppeteer
In Puppeteer how can I extract the date and list of items purchased on that date? I can extract each class separately with a querySelectorAll([class='date or item']) but don't know how to do it all at one time.
I wish to extract Jan 1, 2022, item-1, item-2, item-3, Feb 1, 2022, item-4, Mar 1, 2022, item-5, item-6.
<ul>
<li>
<div class=”date”>Jan 1, 2022</div>
<div class=”item”>item-1</div>
<div class=”item”>item-2</div>
<div class=”item”>item-3</div>
</li>
<li>
<div class=”date”>Feb 1, 2022</div>
<div class=”item”>item-4</div>
</li>
<li>
<div class=”date”>Mar 1, 2022</div>
<div class=”item”>item-5</div>
<div class=”item”>item-6</div>
</li>
</ul>
Solution 1:[1]
I'm not 100% sure if you want a flat array or an array organized by date (the second option seems much clearer to me), but here's both:
// flat array of both classes
console.log(
[...document.querySelectorAll(".date, .item")]
.map(e => e.textContent)
);
// date array of objects
console.log(
[...document.querySelectorAll(".date")].map(e => ({
date: e.textContent,
// or e.parentNode
items: [...e.closest("li").querySelectorAll(".item")]
.map(e => e.textContent)
}))
);
// object keyed by dates, if they're unique
console.log(
[...document.querySelectorAll(".date")].reduce((a, e) => {
a[e.textContent] =
[...e.closest("li").querySelectorAll(".item")]
.map(e => e.textContent)
;
return a;
}, {})
);
<ul>
<li>
<div class="date">Jan 1, 2022</div>
<div class="item">item-1</div>
<div class="item">item-2</div>
<div class="item">item-3</div>
</li>
<li>
<div class="date">Feb 1, 2022</div>
<div class="item">item-4</div>
</li>
<li>
<div class="date">Mar 1, 2022</div>
<div class="item">item-5</div>
<div class="item">item-6</div>
</li>
</ul>
Here's how to scrape the alternate HTML structure with nested image tags based on your new requirements from the comments and this follow-up question:
console.log(
[...document.querySelectorAll(".date, .item")]
.flatMap(e =>
e.classList.contains("item")
? [...e.querySelectorAll("img")]
.map(e => e.getAttribute("alt"))
: e.textContent
)
);
<ul>
<li>
<div class="date">Mar 1, 2022</div>
<div class="item">
<img src="http://image.com/img3.jpg" alt="item-3">
<img src="http://image.com/img5.jpg" alt="item-5">
</div>
</li>
<li>
<div class="date">Mar 3, 2022</div>
<div class="item">
<img src="http://image.com/img2.jpg" alt="item-2">
</div>
</li>
</ul>
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
