'Axios & Cheerio - Visit a list of URLS contained in JSON file, and add extracted data into that object?
I'm attempting to build a scraper with Axios & Cheerio.
I'm needing to do this in two parts. The first part I've already completed, which visits a site and writes me a JSON file that contains a bunch of objects like so:
[
{
"count": 0,
"title": "Result One",
"url": "http://www.resultone.com"
},
{
"count": 1,
"title": "Result Two",
"url": "http://www.resulttwo.com"
},
{
"count": 2,
"title": "Result Three",
"url": "http://www.resultone.com"
},
]
Now for the second part, I need to read this JSON file, visit each URL listed, extract some data from the page and add it to the current JSON object in the original file.
Once the JSON file is created, I can run the following:
let json_url_list = require('./' + outputFile);
// Loop over the URLS
for(i=0; i<json_url_list.length; ++i) {
let url = json_url_list[i].url;
// Run a function here to visit the URL and extract data
getNewData(url)
}
Along with a function like so:
// Create new function to visit each of the URLs captured.
const getNewData = async(url) => {
try {
const response = await axios.get(url)
const $ = cheerio.load(response.data);
// Get the data here (using page title for example)
const title = $('title').text();
// TODO: Add the new data above to the original JSON object in the file we're reading from
return false;
} catch (error) {
console.error(error)
}
}
But, this is where I run out of ideas on how to make this work... Could anyone point me in the right direction?
Thank you!
Solution 1:[1]
Try this.
https://stackblitz.com/edit/js-tmmnqj?file=index.js
// Create new function to visit each of the URLs captured.
const getNewData = async (url) => {
try {
const response = await axios.get(url);
const $ = cheerio.load(response.data);
// Get the data here (using page title for example)
const title = $('title').text();
console.log(title);
// TODO: Add the new data above to the original JSON object in the file we're reading from
return false;
} catch (error) {
console.error(error);
}
};
(async () => {
// Loop over the URLS
for (const jsonData of json_url_list) {
let url = jsonData.url;
// Run a function here to visit the URL and extract data
const data = await getNewData(url);
}
})();
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | k22pr |
