'querySelectorAll doesn't capture all elements

I am trying to scan and manipulate DOM of a webpage the following Code:

var elements = document.querySelectorAll('*');
for (var i = 0; i < elements.length; i++) {
   if (!elements[i].firstElementChild) { 
       if (elements[i].innerHTML != "" ){ 
           elements[i].innerHTML = "abc_"+ elements[i].innerHTML+"_123";
       }
   }
}

While it works well on many pages, it is not picking up all the elements on a specific page that is my real target. On that page, it captures and edit strings of few elements, but not all. I have also tried using getElementsByTagName()

The elements that are not captured have an XPath such as:

/html/body/div[4]/div[2]/div/div[2]/div/div/div/div/div[1]/div/div[2]/nav/div/div[1]/div/span/div/text()[1]

I also noticed "flex" written in front of these elements.

I also tried the script by Douglas Crockford, but, this also is unable to catch the elements described above.

The script by Douglas is published at

https://www.javascriptcookbook.com/article/traversing-dom-subtrees-with-a-recursive-walk-the-dom-function/

function walkTheDOM(node, func) {
    func(node);
    node = node.firstChild;
    while (node) {
        walkTheDOM(node, func);
        node = node.nextSibling;
    }
}

// Example usage: Process all Text nodes on the page
walkTheDOM(document.body, function (node) {
    if (node.nodeType === 3) { // Is it a Text node?
        var text = node.data.trim();
        if (text.length > 0) { // Does it have non white-space text content?
            // process text
        }
    }
});

Any idea what am I doing wrong?

Here is a screenshot of inspect element: [enter image description here]



Solution 1:[1]

In your snippet, you are not selecting all the nodes, since document.querySelectorAll(*) does not select the text-nodes, but only elements.

Besides, you are explicitly ignoring the text-nodes, because you specify .firstElementChild. A text-node is not an element. An element in the DOM is a "tag" like <div> for example. It has the nodeType: 1 a text-node has nodeType: 3.

So, if you'd process for example:

OuterTextNode<div>InnerTextNode</div>

the div would be the first element and Inner- and OuterTextNode are text-nodes. Both, the query selector and the .firstElementChild would only select the element (div) here.

It should work with the DOM-tree-walking code:

const blackList = ['script']; // here you could add some node names that you want to ignore

function walkTheDOM(node, func) {
  func(node);
  node = node.firstChild;
  while (node) {
    if (!blackList.includes(node.nodeName.toLowerCase())) {
      walkTheDOM(node, func);
    }
    node = node.nextSibling;
  }
}

walkTheDOM(document.body, function(node) {
  if (node.nodeType === 3) {
    var text = node.data.trim();
    if (text.length > 0) {
      console.log(text);
      console.log(`replaced: PREFIX_${text}_POSTFIX`);
    }
  }
});
.as-console-wrapper {
  top: 0;
  max-height: 100% !important;
}
<div>
  All
  <span>In span</span> Some more text
  <div>
    <div>
      Some nested text
      <div>Sibling</div>
      <span>
      Another
      Another
      <span>
        Deep
        <span>
          <span>
            <span>
              <span>
                <span>Deeper</span>
      </span>
      </span>
      </span>
      </span>
      </span>
      </span>
    </div>
    <!-- Some comment !-->
    <script>
      // some script
      const foo = 'foo';
    </script>
  </div>
</div>

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1