'How to extract textContent from a list of classes in Puppeteer

In Puppeteer how can I extract the date and list of items purchased on that date? I can extract each class separately with a querySelectorAll([class='date or item']) but don't know how to do it all at one time.

I wish to extract Jan 1, 2022, item-1, item-2, item-3, Feb 1, 2022, item-4, Mar 1, 2022, item-5, item-6.

 <ul>
   <li>
       <div class=”date”>Jan 1, 2022</div>
           <div class=”item”>item-1</div>
           <div class=”item”>item-2</div>
           <div class=”item”>item-3</div>
    </li>
   <li>
       <div class=”date”>Feb 1, 2022</div>
           <div class=”item”>item-4</div>
    </li>   
    <li>
       <div class=”date”>Mar 1, 2022</div>
           <div class=”item”>item-5</div>
           <div class=”item”>item-6</div>
    </li>
</ul>


Solution 1:[1]

I'm not 100% sure if you want a flat array or an array organized by date (the second option seems much clearer to me), but here's both:

// flat array of both classes
console.log(
  [...document.querySelectorAll(".date, .item")]
    .map(e => e.textContent)
);

// date array of objects
console.log(
  [...document.querySelectorAll(".date")].map(e => ({
    date: e.textContent,
         // or e.parentNode
    items: [...e.closest("li").querySelectorAll(".item")]
      .map(e => e.textContent)
  }))
);

// object keyed by dates, if they're unique
console.log(
  [...document.querySelectorAll(".date")].reduce((a, e) => {
    a[e.textContent] =
      [...e.closest("li").querySelectorAll(".item")]
        .map(e => e.textContent)
    ;
    return a;
  }, {})
);
<ul>
  <li>
    <div class="date">Jan 1, 2022</div>
    <div class="item">item-1</div>
    <div class="item">item-2</div>
    <div class="item">item-3</div>
  </li>
  <li>
    <div class="date">Feb 1, 2022</div>
    <div class="item">item-4</div>
  </li>
  <li>
    <div class="date">Mar 1, 2022</div>
    <div class="item">item-5</div>
    <div class="item">item-6</div>
  </li>
</ul>

Here's how to scrape the alternate HTML structure with nested image tags based on your new requirements from the comments and this follow-up question:

console.log(
  [...document.querySelectorAll(".date, .item")]
    .flatMap(e =>
      e.classList.contains("item")
      ? [...e.querySelectorAll("img")]
          .map(e => e.getAttribute("alt"))
      : e.textContent
    )
);
<ul>
  <li>
    <div class="date">Mar 1, 2022</div>
    <div class="item">
      <img src="http://image.com/img3.jpg" alt="item-3">
      <img src="http://image.com/img5.jpg" alt="item-5">
    </div>
  </li>
  <li>
    <div class="date">Mar 3, 2022</div>
    <div class="item">
      <img src="http://image.com/img2.jpg" alt="item-2">
    </div>
  </li>
</ul>

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1