'How to write this crawler in JavaScript?

The idea is very simple:

Imagine a simple white page with a form with a single input tag (like Google homepage ). When I insert a link of a blog post in this form, then the javascript-crawler search the first image in the web page of the blog post (through ajax), show it in the white page and save it on my server.

This crawler works like Digg and Facebook-wall.

What function I have to use for this crawler?



Solution 1:[1]

Darin is right, javascript cannot request content from another domain. But it can dynamically add script tags to document and includes some scripts from other domains. (detailed information: jsonp)

I can suggest you to use YQL. You can crawl every page that you want with Yahoo's YQL library by coding only Javascript. Yahoo servers fetchs urls that you requested, parses HTML and sends you requested part of documents.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Murat Çorlu