'Wget or curl to get urls in the site code source (no download!)

I have a subscription to a site for video of music. It has about 4000 videos. I want to get more infos about the videos for my own needs, as duration or subtitles, and classified them in more options that they are offering.

For this I am getting the source code, and then search in it for .M3u8 links, then sending terminal script to get the infos.

I did download the html files with SiteSucker (Macosx), but its very long process and demanding much to my old Macbook...

I am trying with wget, or curl, but I am looking to find a way to get the urls needed instead of all html file.

What I need is: 1- the url of the page, and the url of the .m3u8

2- eventually in a second time, the duration, and the language of the subtitles available (En, fr, ja...)

I also need to send my username and password otherwise i just get an highlight of the video.

For now I have just try some things like this, but it brings all source code and i don't need it...

wget https://philharmoniedeparis.fr/fr/live/concert/ -O -q --no-parent --recursive 2>&1 >

An example here: https://philharmoniedeparis.fr/fr/live/concert/1047693-insula-orchestra-laurence-equilbey the m3u8: https://otoplayer.philharmoniedeparis.fr/fr/m3u8/1047693.m3u8

This is not the site im working on, but it works the same, just the m3u8 urls, are ten time longer...

Im using grep with this to get the m3u8 url in the source code:

\b(http)\S+(m3u8?)

Any help welcome



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source