'How to enforce C# Task to exist only once at a time for a given ID

There were multiple times in different applications that I needed to accomplish the following behavior with C# Task and I did it in a certain way, and would like to receive an insight whether it's the best way to achieve the desired effect, or there are other better ways.

The issue is that in certain circumstances I would like a specific Task to exist only in one instance. For example, if someone requests, let's say a list of products by executed a method like Task GetProductsAsync(), and someone else tries to request the same thing, it wouldn't fire another task, but rather return already existing task. When the GetProductsAsync finishes, all of those callers who had previously requested the result will receive the same result. So, there should ever be only one GetProductsAsync execution at a given point of time.

After failed trials to find something similar and well known design pattern to solve this issue, I came up with my own implementation. Here is it

public class TaskManager : ITaskManager
    {
        private readonly object _taskLocker = new object();
        private readonly Dictionary<string, Task> _tasks = new Dictionary<string, Task>();
        private readonly Dictionary<string, Task> _continuations = new Dictionary<string, Task>();

        public Task<T> ExecuteOnceAsync<T>(string taskId, Func<Task<T>> taskFactory)
        {
            lock(_taskLocker)
            {

                if(_tasks.TryGetValue(taskId, out Task task))
                {
                    if(!(task is Task<T> concreteTask))
                    {
                        throw new TaskManagerException($"Task with id {taskId} already exists but it has a different type {task.GetType()}. {typeof(Task<T>)} was expected");
                    }
                    else
                    {
                        return concreteTask;
                    }
                }
                else
                {
                    Task<T> concreteTask = taskFactory();
                    _tasks.Add(taskId, concreteTask);
                    _continuations.Add(taskId, concreteTask.ContinueWith(_ => RemoveTask(taskId)));
                    return concreteTask;
                }
            }
        }

        private void RemoveTask(string taskId)
        {
            lock(_taskLocker)
            {
                if(_tasks.ContainsKey(taskId))
                {
                    _tasks.Remove(taskId);
                }

                if(_continuations.ContainsKey(taskId))
                {
                    _continuations.Remove(taskId);
                }
            }
        }
    }

The idea is that we will have a single instance of TaskManager throughout the application lifetime. Any async Task request that should be executed only once at a given point in time, will call ExecuteOnceAsync providing the factory method to create the Task itself, and desired application wide unique ID. Any other task that will come in with the same ID, the Task manager with reply with the same instance of Task created before. Only if there are no other Tasks with that ID, the manager will call the factory method and will start the task. I have added locks around code task creation and removal, to ensure thread safety. Also, in order to remove the task from the stored dictionary after Task has been completed, I've added a continuation task using ContinueWith method. So, after task has been completed, both the task itself, and its continuation will be removed.

From my side this seems to be a pretty common scenario. I would assume there is a well established design pattern, or perhaps C# API that accomplishes this exact same thing. So, any insights or suggestions will be very appreciated.



Solution 1:[1]

I think you can do this with a one of the useful concurrent classes.

It begins the work only if it fails to get any existing work, then waits for it to complete. If it has been run before, it'll get the already completed (or in progress) task and wait for it to complete.

// Add this static dictionary to your class

static readonly ConcurrentDictionary<string, Task> tasks = new();

// Add this to your doing something once method in that class

var work = this.tasks.GetOrAdd(taskId, _ =>
{
    return client.DoSomethingAsync();
});

await work;

Important

See Theodor's comment below about the delegate running outside the lock. This is unfortunate and goes back to a design for a concurrent dictionary I made well over 10 years ago (and pitched to Microsoft as a better way to do ConcurrentDictionary).

I have my own thread-safe dictionary with this method.

bool TryAdd(K key, T value, out T contains);

This simple design returns the newly-added or existing value and affords some awesome patterns, especially for caching/request de-duping; adding a task to get something vs. waiting on an existing task to get it.

I notice ConcurrentDictionary has a TryAdd but it doesn't return the existing value upon failure, which is a shame, but something like this might work:

static readonly ConcurrentDictionary<string, Task<Task>> tasks = new();

//

var newTask = new Task<Task>(() => DoSomethingAsync());

if (this.tasks.TryAdd(taskId, newTask))
{
    newTask.Start();
}

var somethingTask = await this.tasks[taskId];
await somethingTask;

It assumes the tasks are never removed.

Basically there's an outer Task which only gets started by the race winner. There needs to be another Task inside it if the work being done is async, like creating a file or a network resource.

Fingers crossed this one's good.

Solution 2:[2]

If you want to allow multiple tasks with the same taskId but different data type to run at the same time, then you could move the generic type <T> from the method to the class. That way each type <T> has its own dictionary. It also allows you to make the dictionary store Task<T> and avoid typecasting.

You said that you'll only have a single instance of the TaskManager class, but if you can change that to only having a single instance of each class that calls into TaskManager, you can avoid needing a dictionary (and therefore hashing the tasks's name) by each class having an instance of

public class TaskManager<T>
{
    private Task<T> _currentTask;
    private object _lock = new object();

    public Task<T> ExecuteOnceAsync(string taskId, Func<Task<T>> taskFactory)
    {
        if (_currentTask == null)
        {
            lock (_lock)
            {
                if (_currentTask == null)
                {
                    Task<T> concreteTask = taskFactory();
                    concreteTask.ContinueWith(RemoveTask);
                    _currentTask = concreteTask;
                }
            }
        }

        return _currentTask;
    }

    private void RemoveTask()
    {
        _currentTask = null;
    }
}

If you want to be super correct, then you can use Interlocked.Exchange instead of assigning values directly to _currentTask, which is more important in the anonymous function passed to ContinueWith, since it runs outside of the lock. You might want to look into using some Interlocked method to check if the value is null or not as well. The lock isn't needed, by the way, if you know that calls to ExecuteOnceAsync won't happen in parallel, for example if it gets called by a button click event handler in a WPF or WinForms app (even if a user clicks multiple times, each should happen sequentially). A web app can't make such guarantees though.

You can see I do the if (_currentTask == null) check twice, once inside the lock and once outside the lock, which allows the case of a task already existing to avoid the perf hit of needing to acquire the lock. I don't believe you can use the same trick with the code in your question, because if RemoveTask is called at the same time as TryGetValue, you can get into a bad state. You could switch to using a ConcurrentDictionary, and that would probably allow you to avoid using the lock inside RemoveTask and only lock in ExecuteOnceAsync when TryGetValue retuns false. But whether that improves performance or not needs to be measured.

Another issue with both your code and mine is that when taskFactory is slow, it holds the lock for a long time (remember, when calling an async method, it runs synchronously until the first await where the task it awaits is not already complete). This can be mitigated by using Task<T> concreteTask = Task.Run(()=> taskFactory());, but Task.Run has it's own overhead, so if taskFactory does indeed await very quickly, it's probably better performance to keep it as it is. Again, if perf matters you need to measure.

And while I'm by no means a perf expert, depending on your perf requirements my suggestion has several improvements:

  • While Dictionary lookups are O(1), so is Thread.Sleep(1000), in other words it could be constantly slower than an alternative, even if the alternative is O(n) (for small values of n). Since your dictionary key is a string, string.GetHashCode is O(n) depending on the length of the string, and Dictionary perf can be hurt by hash collisions. My suggestion only has a single task object, so it's also O(1) and it doesn't require hashing anything or searching an array or however the Dictionary is implemented.
  • It's super minor, but since my RemoveTask doesn't need to capture any variables, I can pass it directly to ContinueWith as a delegate (since types match), whereas you use an anonymous function which means the compiler has to create an anonymous class to capture the taskId variable, and create an instance of this class on the heap each time your code runs, which increases memory pressure. This, by itself won't have any significant perf impact (unless run in a very tight loop), but it can contribute to "death by 1000 cuts" if you have many other bits of code which create small objects on the heap.
  • As already mentioned above, my code is lock-free when a task already exists.
  • Your code effectively uses a global application lock. This is the most important perf killer. If your TaskManager is used every time a database request is made, then your implementation can only start one database query at a time, even though two tasks with different taskId can never return the same task. And RemoveTask, regardless of which task completes, blocks all calls to ExecuteOnceAsync. By using a different instance of TaskManager per query type, you can avoid GetProductsAsync from blocking GetCustomerInformation.

Having written all that, if your goal is not to use one TaskManager per application, but instead use one TaskManager per database query method, meaning that taskId is actually a representation of the query parameters of GetProductsAsync, then you should ignore everything I wrote :) Well, almost everything. You can still move the <T> to the class and dictionary definitions, and avoid doing typecasting in your method.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2