'How to web scrape HTML Table using PHP

I am trying to scrape the table from the following link and place into an array.

https://www.tradingview.com/markets/currencies/cross-rates-overview-prices/

I have tried various ways, just cannot get it right.

<?php

$htmlContent = file_get_contents("https://www.tradingview.com/markets/currencies/cross-rates-overview-prices/");
    
$DOM = new DOMDocument();
$DOM->loadHTML($htmlContent);

$Header = $DOM->getElementsByTagName('th');
$Detail = $DOM->getElementsByTagName('td');

//#Get header name of the table
foreach($Header as $NodeHeader) 
{
    $aDataTableHeaderHTML[] = trim($NodeHeader->textContent);
}
//print_r($aDataTableHeaderHTML); die();

//#Get row data/detail table without header name as key
$i = 0;
$j = 0;
foreach($Detail as $sNodeDetail) 
{
    $aDataTableDetailHTML[$j][] = trim($sNodeDetail->textContent);
    $i = $i + 1;
    $j = $i % count($aDataTableHeaderHTML) == 0 ? $j + 1 : $j;
}
//print_r($aDataTableDetailHTML); die();

//#Get row data/detail table with header name as key and outer array index as row number
for($i = 0; $i < count($aDataTableDetailHTML); $i++)
{
    for($j = 0; $j < count($aDataTableHeaderHTML); $j++)
    {
        $aTempData[$i][$aDataTableHeaderHTML[$j]] = $aDataTableDetailHTML[$i][$j];
    }
}
$aDataTableDetailHTML = $aTempData; unset($aTempData);
print_r($aDataTableDetailHTML); die();

This is the error output: (Note, there are quite a few lines of these errors)

Warning: DOMDocument::loadHTML(): Tag svg invalid in Entity, line: 405 in C:\xampp\htdocs\Testing\scraper.php on line 6

Warning: DOMDocument::loadHTML(): Tag path invalid in Entity, line: 405 in C:\xampp\htdocs\Testing\scraper.php on line 6

Warning: Undefined variable $aDataTableDetailHTML in C:\xampp\htdocs\Testing\scraper.php on line 30

Fatal error: Uncaught TypeError: count(): Argument #1 ($var) must be of type Countable|array, null given in C:\xampp\htdocs\Testing\scraper.php:30 Stack trace: #0 {main} thrown in C:\xampp\htdocs\Testing\scraper.php on line 30

Your help would be much appreciated.



Solution 1:[1]

There are two problems:

  1. The errors are from validating the html. Take a look this answer how to deal with that.

  2. The actual html where you are interested in (the table-data) are not present in the source-html. They are created via JavaScript somehow. To deal with that kind of pages you could use i.e. Selenium. See this answer on how to do that in php.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Siebe Jongebloed