'get source code of whole website - which loads additional content after scrolling down
I want to fetch this site
https://www.film-fish.com/modern-mindless-action
to fetch the IMDB IDs of all movies listed there.
The problem is that the page loads all movies listed there just after scrolling down. So, a simple wget
doesn't work.
Even if I scroll to the bottom of the page and view the source code, I do not see the last movie in the list (Hard Kill (2020)).
So the problem seems to be that the content is being created via JavaScript.
Has anybody a tip on how to achieve that?
Solution 1:[1]
So the problem seems to be that the content is being created via a js script. Has anybody a tip on how to achieve that?
Indeed, executing JavaScript code is beyond scope of GNU Wget. You would need browser automation tool. If you know some Node.js or JavaScript I suggest taking look at PhantomJS Quick Start, Page Automation. Please take look at first example in 2nd link, you should be probably able to rework to your needs, i.e. instruct page to scroll down using JavaScript then extract what you need using JavaScript.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Quentin |