'SIMPLEHTMLDOM_1_9_1: str_get_html fails when working with a large source file

So I'm pretty new to simplehtmldom, I've been playing with it for a few hours now in order to scrape data from a large HTML file.

While testing, I modified the source HTML to only keep 50 records while keeping everything else the same. I was able to finish my script and it worked perfect. I then tried to use the actual large file (4500 records, 8MB), and at first, I got a simple crash with no error.

After a lot of Googling, I changed this in simple_html_dom.php:

<!-- language: php -->
defined('MAX_FILE_SIZE') || define('MAX_FILE_SIZE', 128000000);

I also changed this in my main script:

<!-- language: php -->
ini_set('max_execution_time', 120);
ini_set('max_input_time', 120);
ini_set('default_socket_timeout', 120);
ini_set('memory_limit', '128M');

Despite all this, the script still crashes at the str_get_html line:

<!-- language: php -->
function getThings($filePath) {
  require_once('../assets/simplehtmldom_1_9_1/simple_html_dom.php');
  $listPath = $filePath;
  $html_string = file_get_contents($listPath);
  $html = str_get_html($html_string); // Crashes here
  // do more stuff
}

It crashes with this error:

Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 20480 bytes) in /path/to/simplehtmldom_1_9_1/simple_html_dom.php on line 2124

I am at my wits end, and sleep deprived. Any help would be appreciated!

Thanks!



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source