'HTTrack possible using cookies

I want to download the page from a URL, easy enough. But on the first page I have to login, as I normally do from a normal browser. But HTTrack is downloading from the first page since it can't use my cookies or login.

Is it any way for me to get around this?



Solution 1:[1]

This question was asked in 2013 so I don't know if Httrack was supporting cookies back then, but now it definitely does.

Instructions:

  1. Login to your website using Firefox or Chrome, then look at the login cookie.
  2. Inside the Httrack folder where you are downloading your website, there should be a file named cookies.txt, if not, create one.
  3. Copy the cookie information from your browser to this file. You might also have to copy your useragent from your browser to the Httrack config.
  • If you don't know how to look at your cookies, it's pretty simple...
    You can either install an extension like Get cookies.txt to export cookies, or use the Developer Tools like so:
    Firefox: F12 -> Storage -> Cookies
    Chrome: F12 -> Application -> Storage -> Cookies

Example of a cookie.txt for Httrack:

www.httrack.com TRUE    /       FALSE   1999999999  foo bar
www.example.com TRUE    /folder FALSE   1999999999  JSESSIONID  xxx1234
www.example.com TRUE    /hello  FALSE   1999999999  JSESSIONID  yyy1234

Reference: http://httrack.kauler.com/help/Cookies

Solution 2:[2]

Try using cURL in PHP:

http://php.net/manual/en/book.curl.php

There are wrappers for this, like:

http://semlabs.co.uk/journal/object-oriented-curl-class-with-multi-threading

Use options such as:

EDIT: More specific, not tested

Download the class from:

http://semlabs.co.uk/journal/object-oriented-curl-class-with-multi-threading

require_once( 'CURL.php' ); //Change this to whatever that class is called in the above
$curl = new CURL();  
$curl->retry = 2;  
    $opts = array(
    CURLOPT_USERAGENT => 'Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.3) Gecko/20091020 Linux Mint/8 (Helena) Firefox/3.5.3',
    CURLOPT_COOKIEFILE  => 'fb.tmp',
    CURLOPT_COOKIEJAR   => 'fb.tmp',
    CURLOPT_FOLLOWLOCATION  => 1,
    CURLOPT_RETURNTRANSFER  => 1,
    CURLOPT_SSL_VERIFYHOST  => 0,
    CURLOPT_SSL_VERIFYPEER  => 0,
    CURLOPT_TIMEOUT     => 20
);
$post_data = array(  ); //put your login POST data here
$opts[CURLOPT_POSTFIELDS] = http_build_query( $post_data );
$curl->addSession( 'https://www.facebook.com/messages', $opts );  
$result = $curl->exec();  
$curl->clear();
print_r( $result );

Note, that sometimes you need to load a page first, to set a cookie, before they will let you login.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 jaggi
Solution 2 Community