'node.js check if a remote URL exists
How do I check to see if a URL exists without pulling it down? I use the following code, but it downloads the whole file. I just need to check that it exists.
app.get('/api/v1/urlCheck/', function (req,res) {
var url=req.query['url'];
var request = require('request');
request.get(url, {timeout: 30000, json:false}, function (error, result) {
res.send(result.body);
});
});
Appreciate any help!
Solution 1:[1]
2021 update
Use url-exist:
import urlExist from 'url-exist';
const exists = await urlExist('https://google.com');
// Handle result
console.log(exists);
2020 update
request has now been deprecated which has brought down url-exists with it. Use url-exist instead.
const urlExist = require("url-exist");
(async () => {
const exists = await urlExist("https://google.com");
// Handle result
console.log(exists)
})();
If you (for some reason) need to use it synchronously, you can use url-exist-sync.
2019 update
Since 2017, request and callback-style functions (from url-exists) have fallen out of use.
However, there is a fix. Swap url-exists for url-exist.
So instead of using:
const urlExists = require("url-exists")
urlExists("https://google.com", (_, exists) => {
// Handle result
console.log(exists)
})
Use this:
const urlExist = require("url-exist");
(async () => {
const exists = await urlExist("https://google.com");
// Handle result
console.log(exists)
})();
Original answer (2017)
If you have access to the request package, you can try this:
const request = require("request")
const urlExists = url => new Promise((resolve, reject) => request.head(url).on("response", res => resolve(res.statusCode.toString()[0] === "2")))
urlExists("https://google.com").then(exists => console.log(exists)) // true
Most of this logic is already provided by url-exists.
Solution 2:[2]
Try this:
var http = require('http'),
options = {method: 'HEAD', host: 'stackoverflow.com', port: 80, path: '/'},
req = http.request(options, function(r) {
console.log(JSON.stringify(r.headers));
});
req.end();
Solution 3:[3]
Thanks! Here it is, encapsulated in a function (updated on 5/30/17 with the require outside):
var http = require('http'),
url = require('url');
exports.checkUrlExists = function (Url, callback) {
var options = {
method: 'HEAD',
host: url.parse(Url).host,
port: 80,
path: url.parse(Url).pathname
};
var req = http.request(options, function (r) {
callback( r.statusCode== 200);});
req.end();
}
It's very quick (I get about 50 ms, but it will depend on your connection and the server speed). Note that it's also quite basic, i.e. it won't handle redirects very well...
Solution 4:[4]
Simply use url-exists npm package to test if url exists or not
var urlExists = require('url-exists');
urlExists('https://www.google.com', function(err, exists) {
console.log(exists); // true
});
urlExists('https://www.fakeurl.notreal', function(err, exists) {
console.log(exists); // false
});
Solution 5:[5]
require into functions is wrong way in Node.
Followed ES6 method supports all correct http statuses and of course retrieve error if you have a bad 'host' like fff.kkk
checkUrlExists(host,cb) {
http.request({method:'HEAD',host,port:80,path: '/'}, (r) => {
cb(null, r.statusCode >= 200 && r.statusCode < 400 );
}).on('error', cb).end();
}
Solution 6:[6]
Take a look at the url-exists npm package https://www.npmjs.com/package/url-exists
Setting up:
$ npm install url-exists
Useage:
const urlExists = require('url-exists');
urlExists('https://www.google.com', function(err, exists) {
console.log(exists); // true
});
urlExists('https://www.fakeurl.notreal', function(err, exists) {
console.log(exists); // false
});
You can also promisify it to take advantage of await and async:
const util = require('util');
const urlExists = util.promisify(require('url-exists'));
let isExists = await urlExists('https://www.google.com'); // true
isExists = await urlExists('https://www.fakeurl.notreal'); // false
Happy coding!
Solution 7:[7]
It seems a lot of people have recommended a library to use, but url-exist includes a dependency of a data fetching lib so here is a clone of it using all native node modules:
const http = require('http');
const { parse, URL } = require('url');
// https://github.com/sindresorhus/is-url-superb/blob/main/index.js
function isUrl(str) {
if (typeof str !== 'string') {
return false;
}
const trimmedStr = str.trim();
if (trimmedStr.includes(' ')) {
return false;
}
try {
new URL(str); // eslint-disable-line no-new
return true;
} catch {
return false;
}
}
// https://github.com/Richienb/url-exist/blob/master/index.js
function urlExists(url) {
return new Promise((resolve) => {
if (!isUrl(url)) {
resolve(false);
}
const options = {
method: 'HEAD',
host: parse(url).host,
path: parse(url).pathname,
port: 80,
};
const req = http.request(options, (res) => {
resolve(res.statusCode < 400 || res.statusCode >= 500);
});
req.end();
});
}
urlExists(
'https://stackoverflow.com/questions/26007187/node-js-check-if-a-remote-url-exists'
).then(console.log);
This might also appeal to those who'd rather not install a dependency for a very simple purpose.
Solution 8:[8]
Using the other responses as reference, here's a promisified version which also works with https uris (for node 6+):
const http = require('http');
const https = require('https');
const url = require('url');
const request = (opts = {}, cb) => {
const requester = opts.protocol === 'https:' ? https : http;
return requester.request(opts, cb);
};
module.exports = target => new Promise((resolve, reject) => {
let uri;
try {
uri = url.parse(target);
} catch (err) {
reject(new Error(`Invalid url ${target}`));
}
const options = {
method: 'HEAD',
host: uri.host,
protocol: uri.protocol,
port: uri.port,
path: uri.path,
timeout: 5 * 1000,
};
const req = request(options, (res) => {
const { statusCode } = res;
if (statusCode >= 200 && statusCode < 300) {
resolve(target);
} else {
reject(new Error(`Url ${target} not found.`));
}
});
req.on('error', reject);
req.end();
});
It can be used like this:
const urlExists = require('./url-exists')
urlExists('https://www.google.com')
.then(() => {
console.log('Google exists!');
})
.catch(() => {
console.error('Invalid url :(');
});
Solution 9:[9]
I see in your code that you are already using the request library, so just:
const request = require('request');
request.head('http://...', (error, res) => {
const exists = !error && res.statusCode === 200;
});
Solution 10:[10]
If you're using axios, you can fetch the head like:
const checkUrl = async (url) => {
try {
await axios.head(fullUrl);
return true;
} catch (error) {
if (error.response.status >= 400) {
return false;
}
}
}
You may want to customise the status code range for your requirements e.g. 401 (Unauthorized) could still mean a URL exists but you don't have access.
Solution 11:[11]
my awaitable async ES6 solution, doing a HEAD request:
// options for the http request
let options = {
host: 'google.de',
//port: 80, optional
//path: '/' optional
}
const http = require('http');
// creating a promise (all promises a can be awaited)
let isOk = await new Promise(resolve => {
// trigger the request ('HEAD' or 'GET' - you should check if you get the expected result for a HEAD request first (curl))
// then trigger the callback
http.request({method:'HEAD', host:options.host, port:options.port, path: options.path}, result =>
resolve(result.statusCode >= 200 && result.statusCode < 400)
).on('error', resolve).end();
});
// check if the result was NOT ok
if (!isOk)
console.error('could not get: ' + options.host);
else
console.info('url exists: ' + options.host);
Solution 12:[12]
Currently request module is being deprecated as @schlicki pointed out. One of the alternatives in the link he posted is got:
const got = require('got');
(async () => {
try {
const response = await got('https://www.nodesource.com/');
console.log(response.body);
//=> '<!doctype html> ...'
} catch (error) {
console.log(error.response.body);
//=> 'Internal server error ...'
}
})();
But with this method, you will get the whole HTML page in the reponse.body. In addition got may have many more functionalities you may not need. That's I wanted to add another alternative I found to the list. As I was using the portscanner library, I could use it for the same aim without downloading the content of the website. You may need to use the 443 port as well if the website works with https
var portscanner = require('portscanner')
// Checks the status of a single port
portscanner.checkPortStatus(80, 'www.google.es', function(error, status) {
// Status is 'open' if currently in use or 'closed' if available
console.log(status)
})
Anyway, the most close approach is url-exist module as @Richie Bendall explains in his post. I just wanted to add some other alternative
Solution 13:[13]
danwarfel's answer got me some of the way there but it's still not quite right: it leaks memory, doesn't follow redirects, doesn't support https (likely what you want) and doesn't actually answer the question - it just logs headers! Here's my version:
import * as https from "https";
// Return true if the URL is found and returns 200. Returns false if there are
// network errors or the status code is not 200. It will throw an exception
// for configuration errors (e.g. malformed URLs).
//
// Note this only supports https, not http.
//
async function isUrlFound(url: string, maxRedirects = 20): Promise<boolean> {
const [statusCode, location] = await new Promise<[number?, string?]>(
(resolve, _reject) => {
const req = https.request(
url,
{
method: "HEAD",
},
response => {
// This is necessary to avoid memory leaks.
response.on("readable", () => response.read());
resolve([response.statusCode, response.headers["location"]]);
},
);
req.on("error", _err => resolve([undefined, undefined]));
req.end();
},
);
if (
statusCode !== undefined &&
statusCode >= 300 &&
statusCode < 400 &&
location !== undefined &&
maxRedirects > 0
) {
return isUrlFound(location, maxRedirects - 1);
}
return statusCode === 200;
}
Minimally tested but it seems to work.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
