'HTTrack possible using cookies
I want to download the page from a URL, easy enough. But on the first page I have to login, as I normally do from a normal browser. But HTTrack is downloading from the first page since it can't use my cookies or login.
Is it any way for me to get around this?
Solution 1:[1]
This question was asked in 2013 so I don't know if Httrack was supporting cookies back then, but now it definitely does.
Instructions:
- Login to your website using Firefox or Chrome, then look at the login cookie.
- Inside the Httrack folder where you are downloading your website, there should be a file named
cookies.txt, if not, create one. - Copy the cookie information from your browser to this file. You might also have to copy your useragent from your browser to the Httrack config.
- If you don't know how to look at your cookies, it's pretty simple...
You can either install an extension like Get cookies.txt to export cookies, or use the Developer Tools like so:
Firefox:F12 -> Storage -> Cookies
Chrome:F12 -> Application -> Storage -> Cookies
Example of a cookie.txt for Httrack:
www.httrack.com TRUE / FALSE 1999999999 foo bar
www.example.com TRUE /folder FALSE 1999999999 JSESSIONID xxx1234
www.example.com TRUE /hello FALSE 1999999999 JSESSIONID yyy1234
Reference: http://httrack.kauler.com/help/Cookies
Solution 2:[2]
Try using cURL in PHP:
http://php.net/manual/en/book.curl.php
There are wrappers for this, like:
http://semlabs.co.uk/journal/object-oriented-curl-class-with-multi-threading
Use options such as:
EDIT: More specific, not tested
Download the class from:
http://semlabs.co.uk/journal/object-oriented-curl-class-with-multi-threading
require_once( 'CURL.php' ); //Change this to whatever that class is called in the above
$curl = new CURL();
$curl->retry = 2;
$opts = array(
CURLOPT_USERAGENT => 'Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.3) Gecko/20091020 Linux Mint/8 (Helena) Firefox/3.5.3',
CURLOPT_COOKIEFILE => 'fb.tmp',
CURLOPT_COOKIEJAR => 'fb.tmp',
CURLOPT_FOLLOWLOCATION => 1,
CURLOPT_RETURNTRANSFER => 1,
CURLOPT_SSL_VERIFYHOST => 0,
CURLOPT_SSL_VERIFYPEER => 0,
CURLOPT_TIMEOUT => 20
);
$post_data = array( ); //put your login POST data here
$opts[CURLOPT_POSTFIELDS] = http_build_query( $post_data );
$curl->addSession( 'https://www.facebook.com/messages', $opts );
$result = $curl->exec();
$curl->clear();
print_r( $result );
Note, that sometimes you need to load a page first, to set a cookie, before they will let you login.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | jaggi |
| Solution 2 | Community |
