'Copying a web page content with style with JSoup

Is it possible to use JSoup to copy a web page content (including its styles), as it is possible directly in a browser? For example it is working with Firefox, when you copy the web page on the local disk, you get the content but also the style.

If I take for example Wikipedia pages, it appears that styles are loaded from php, you have the following CSS when you are using the integrated Firefox debugger:

<link rel="stylesheet" href="/w/load.php?lang=en&amp;modules=ext.cite.styles%7Cext.uls.interlanguage%7Cext.visualEditor.desktopArticleTarget.noscript%7Cext.wikimediaBadges%7Cjquery.makeCollapsible.styles%7Cmediawiki.page.gallery.styles%7Cskins.vector.styles.legacy%7Cwikibase.client.init&amp;only=styles&amp;skin=vector">

But after copying with Firefox I have (for example with this article: en.wikipedia.org/wiki/Pablo_Picasso:

<link rel="stylesheet" href="Pablo%20Picasso%20-%20Wikipedia_files/load.css">

Update: if I am using only local links, I can do something like:

try {
   Document doc = Jsoup.parse(new File(url.getFile()), "UTF-8");
   Elements medias = doc.select("[src]");
   Elements imports = doc.select("link[href]");

 for (Element link : medias) {
    String tagName = link.tagName();
    if (tagName != null && tagName.equals("img")) {
       String src = link.attr("src");
       if (!src.isEmpty()) {
          URL _url = FileUtilities.getChildURL(idirURL, src);
          if (FileUtilities.exist(_url)) {
             File outputFile = Utilities.copy(articleDir, _url);
             src = FileUtilities.getRelativePath(_dir, outputFile);
             link.attr("src", src);
          }
       }
    }
 }

 for (Element link : imports) {
    String rel = link.attr("rel");
    if (rel != null && rel.equals("stylesheet")) {
       String href = link.attr("href");
       if (!href.isEmpty()) {
          URL _url = FileUtilities.getChildURL(idirURL, href);
          if (FileUtilities.exist(_url)) {
             File outputFile = Utilities.copy(articleDir, _url);
             href = FileUtilities.getRelativePath(_dir, outputFile);
             link.attr("href", href);
          }
       }
    }
 }
 String html = doc.html();
 try (BufferedWriter _writer = new BufferedWriter(new FileWriter(file))) {
    _writer.append(html);
    _writer.flush();
 }
} catch (IOException e) {
}

But if the links (for example stylesheets) are loaded by PHP, it does not work because I think I must get the content by communicating with the php server.

The question is: is it possible with JSoup to get the stylesheet resources, load them, replace them by local links, and save both the html page and these links?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source