'DOM Parser Chrome extension memory leak

The problem

I have developed an extension that intercepts web requests, gets the HTML the web request originated from and processes it. I have used the DOMParser to parse the HTML and I have realised that the DOMParser is causing massive memory leak issues, which eventually causes the chrome extension to crash.

This is the code that causes the issues. https://gist.github.com/uche1/20929b6ece7d647250828c63e4a2ffd4

What I've tried

Dev Tools Recorded Performance

I have recorded the chrome extension whilst it's intercepting requests and I noticed that as the DOMParser.parseFromString method was called, the more nodes and documents were created which weren't destroyed.

Dev tools screenshot https://i.imgur.com/pMY50kR.png

Task Manager Memory Footprint

I looked at the task manager on chrome and saw that it had a huge memory footprint that wouldn't decrease with time (because garbage collection should kick in after a while). When the memory footprint gets too large the extension crashes.

Task manager memory footprint screenshot https://i.imgur.com/c8fLWCy.png

Heap snapshots

I took some before and after screenshots of the heap and I can see the issue seems to be originating from the HTMLDocuments being allocated that isn't being garbage collected.

Snapshot (before) https://i.imgur.com/Rg2CRi6.png

Snapshot (after) https://i.imgur.com/UQgLuT1.png

Expected outcome

I would want to understand why the DOMParser is causing such memory issues, why it isn't being cleaned up by the garbage collector and what to do to resolve it.

Thanks



Solution 1:[1]

I have resolved the problem. It seems like the issue was because the DOMParser class for some reason kept the references of the HTML documents it parsed in memory and didn't release it. Because my extension is a Chrome extension that runs in the background, exaggerates this problem.

The solution was to use another method of parsing the HTML document which was to use

let parseHtml = (html) => {
    let template = document.createElement('template');
    template.innerHTML = html;
    return template; 
}

This helped resolve the issue.

Solution 2:[2]

You are basically replicating the entire DOM in memory and then never releasing the memory.

We get away with this in a client side app because when we navigate away, the memory used by the scripts on that page is recovered.

In a background script, that doesn't happen and is now your responsibility.

So set both parser and document to null when you are done using it.

chrome.webRequest.onCompleted.addListener(async request => {
    if (request.tabId !== -1) {
        let html = await getHtmlForTab(request.tabId);
        let parser = new DOMParser();
        let document = parser.parseFromString(html, "text/html");
        let title = document.querySelector("title").textContent;
        console.log(title);
        parser = null; // <----- DO THIS
        document = null; // <----- DO THIS
    }
}, requestFilter);

Solution 3:[3]

I cannot point to a confirmed bug report in Chromium, but we were also hit by the memory leak. If you are developing an extension, DOMParser will leak in background scripts on Chromium based browser, but not on Firefox.

We could not get any of the workarounds mentioned here to solve the leak, so we ended up replacing the native DOMParser with the linkedom library, which provides a drop-in replacement and works in the browser (not only in NodeJs). It solves the leaks, so you might consider it, but there are some aspects that you need to be aware of:

  • It will not leak, but its initial memory footprint is higher then using the native parser
  • Performance is most likely slower (but I have not benchmarked it)
  • The DOM generated by its HTML parser might slightly different from what Firefox or Chrome would produce. The effect is most visible in HTML that is broken and where the browsers will attempt to error correct it.

We also tried jsdom first, which tries to be more compatible with the majors browsers at the cost of higher complexity of its codebase. Unfortunately, we found it difficult to make jsdom work in the browser (but on NodeJs it is works well).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Coder Guy
Solution 2
Solution 3 Philipp Claßen