'Task.WhenAny - alternative to List avoiding O(N²) issues
I've been trying to improve my understanding and usage of async code in C#, specifically how to integrate it into existing synchronous code.
I have the following test program, which is basically the test from https://docs.microsoft.com/en-us/dotnet/csharp/programming-guide/concepts/async/start-multiple-async-tasks-and-process-them-as-they-complete?pivots=dotnet-6-0 with a synchronous caller and a LinqPad runnable wrapper.
void Main()
{
var a = new A();
List<string> urls = new List<string>()
{
"https://docs.microsoft.com/dotnet",
"https://docs.microsoft.com/en-us/dotnet/api/system.threading.tasks.task.whenall?view=net-6.0",
"https://stackoverflow.com/questions/11836325/await-operator-can-only-be-used-within-an-async-method"
};
a.GetUrlContentLengths(urls).Dump();
}
public class A
{
public int GetUrlContentLengths(IEnumerable<string> urls)
{
return Task.Run<int>(async() => await GetUrlContentLengthsAsync(urls)).Result;
}
public async Task<int> GetUrlContentLengthsAsync(IEnumerable<string> urls)
{
System.Net.Http.HttpClient client = new System.Net.Http.HttpClient();
IEnumerable<Task<int>> downloadTasksQuery = urls.Select(x => ProcessUrlAsync(x, client));
var downloadTasks = downloadTasksQuery.ToList();
int total = 0;
while (downloadTasks.Any())
{
Task<int> finishedTask = await Task.WhenAny(downloadTasks);
downloadTasks.Remove(finishedTask);
total += await finishedTask;
}
return total;
}
public async Task<int> ProcessUrlAsync(string url, System.Net.Http.HttpClient client)
{
byte[] content = await client.GetByteArrayAsync(url);
Console.WriteLine($"{url,-60} {content.Length,10:#,#}");
return content.Length;
}
}
This linked document describes the O(n²) issues like this:
What we’ve effectively created here is an O(N2) algorithm: for each task, we search the list for the task to remove it, which is an O(N) operation, and we register a continuation with each task, which is also an O(N) operation
So would this minor change to a Dictionary fix this and leave the whole thing as an O(n) operation?
public async Task<int> GetUrlContentLengthsAsync(IEnumerable<string> urls)
{
System.Net.Http.HttpClient client = new System.Net.Http.HttpClient();
IEnumerable<Task<int>> downloadTasksQuery = urls.Select(x => ProcessUrlAsync(x, client));
var downloadTasks = downloadTasksQuery.ToDictionary(xk => xk.GetHashCode(), xv => xv);
int total = 0;
while (downloadTasks.Any())
{
Task<int> finishedTask = await Task.WhenAny(downloadTasks.Values);
downloadTasks.Remove(finishedTask.GetHashCode());
total += await finishedTask;
}
return total;
}
Solution 1:[1]
So would this minor change to a
Dictionaryfix this and leave the whole thing as an O(n) operation?
No. Searching a List<T> is indeed a O(n) operation, but eliminating this operation does not eliminate all O(n) operations that are happening inside the while loop. There is one more O(n) operation hidden inside the Task.WhenAny method, which has a far greater impact (overhead) at slowing down your code than searching in the list. The hidden operation is the attaching of continuations on all incomplete tasks in the downloadTasks collection, and then detaching these continuations when any of the tasks completes. That's a lot of work to do, because it involves memory allocations and synchronization overhead, and the only way to avoid it is to avoid using the WhenAny-in-a-loop antipattern. Here is an alternative O(n) implementation of your algorithm. It's O(n) because only one continuation is attached on each task, by the Task.WhenAll method:
public async Task<int> GetUrlContentLengthsAsync(IEnumerable<string> urls)
{
HttpClient client = new();
int total = 0;
Task<int>[] higherOrderTasks = urls.Select(async url =>
{
int result = await ProcessUrlAsync(url, client).ConfigureAwait(false);
Interlocked.Add(ref total, result);
return result;
}).ToArray();
await Task.WhenAll(higherOrderTasks);
return total;
}
A higher order task is created for each ProcessUrlAsync task, that wraps this task and incorporates the code that should run when the task completes. The continuations after the await ProcessUrlAsync might run concurrently to each other, so you might have to synchronize the access to any shared state that you might have to mutate, like the total variable in the above example. Unless you know for sure that your code will run on a SynchronizationContext that will synchronize the continuations, in which case you should also remove the .ConfigureAwait(false). In this specific case it is actually possible to get rid of the higher order tasks and the shared state altogether, like this:
public async Task<int> GetUrlContentLengthsAsync(IEnumerable<string> urls)
{
HttpClient client = new();
Task<int>[] tasks = urls
.Select(url => ProcessUrlAsync(url, client))
.ToArray();
int[] results = await Task.WhenAll(tasks);
return results.Sum();
}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Theodor Zoulias |
