'Crawling Multiple Urls at one Go using Task parallel Library
I want to Crawl data using multiple URL's and store the data in SQLite, Should I use Parallel. Invoke or parallel for each loop too crawl the URL and fetch the data. I am confused on how to execute this part of my project. I am also struggling on how to start this part of my project which actually crawls articles from different languages in a website
Solution 1:[1]
TPL (task parallel library) vs. async/await is the question about, is your task CPU bound (calculate multiple things in parallel) or I/O bound (interact with multiple files or network requests).
Due to the fact, that you like to crawl multiple URLs, your jobs is I/O bound, which makes it a good candidate for async/await. So you could request all (or a subset) of your list in parallel. Some example code would look something like this:
public async Task<IReadOnlyList<string>> GetContent(IEnumerable<string> urls)
{
var tasks = urls.Select(GetContent);
return await Task.WhenAll(tasks);
}
private async Task<string> GetContent(string url)
{
var content = await httpClient.GetStringAsync(url);
}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Oliver |
