'Crawling Multiple Urls at one Go using Task parallel Library

I want to Crawl data using multiple URL's and store the data in SQLite, Should I use Parallel. Invoke or parallel for each loop too crawl the URL and fetch the data. I am confused on how to execute this part of my project. I am also struggling on how to start this part of my project which actually crawls articles from different languages in a website



Solution 1:[1]

TPL (task parallel library) vs. async/await is the question about, is your task CPU bound (calculate multiple things in parallel) or I/O bound (interact with multiple files or network requests).

Due to the fact, that you like to crawl multiple URLs, your jobs is I/O bound, which makes it a good candidate for async/await. So you could request all (or a subset) of your list in parallel. Some example code would look something like this:

public async Task<IReadOnlyList<string>> GetContent(IEnumerable<string> urls)
{
    var tasks = urls.Select(GetContent);
    return await Task.WhenAll(tasks);
}

private async Task<string> GetContent(string url)
{
    var content = await httpClient.GetStringAsync(url);
}

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Oliver