'SIMPLEHTMLDOM_1_9_1: str_get_html fails when working with a large source file
So I'm pretty new to simplehtmldom, I've been playing with it for a few hours now in order to scrape data from a large HTML file.
While testing, I modified the source HTML to only keep 50 records while keeping everything else the same. I was able to finish my script and it worked perfect. I then tried to use the actual large file (4500 records, 8MB), and at first, I got a simple crash with no error.
After a lot of Googling, I changed this in simple_html_dom.php:
<!-- language: php -->
defined('MAX_FILE_SIZE') || define('MAX_FILE_SIZE', 128000000);
I also changed this in my main script:
<!-- language: php -->
ini_set('max_execution_time', 120);
ini_set('max_input_time', 120);
ini_set('default_socket_timeout', 120);
ini_set('memory_limit', '128M');
Despite all this, the script still crashes at the str_get_html line:
<!-- language: php -->
function getThings($filePath) {
require_once('../assets/simplehtmldom_1_9_1/simple_html_dom.php');
$listPath = $filePath;
$html_string = file_get_contents($listPath);
$html = str_get_html($html_string); // Crashes here
// do more stuff
}
It crashes with this error:
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 20480 bytes) in /path/to/simplehtmldom_1_9_1/simple_html_dom.php on line 2124
I am at my wits end, and sleep deprived. Any help would be appreciated!
Thanks!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
