'c# linq - aggregate values to a list property

Consider a hypothetical scenario where a huge used car dealer (e.g. CarMax) has many dealerships in different states.

public class Inventory
{
    public string Make { get; set; }
    public string Model { get; set; }
    public string Year { get; set; }
    public string State { get; set; }
}

List<Inventory> cars = new List<Inventory>();
cars.Add( new Inventory { Make = "Ford", Model = "F-150", Year = "2014", State = "MT" } );
cars.Add( new Inventory { Make = "Ford", Model = "F-150", Year = "2014", State = "AR" } );
cars.Add( new Inventory { Make = "Ford", Model = "F-150", Year = "2014", State = "OH" } );
cars.Add( new Inventory { Make = "Ford", Model = "F-150", Year = "2015", State = "AZ" } );
cars.Add( new Inventory { Make = "Ford", Model = "F-150", Year = "2015", State = "OR" } );
cars.Add( new Inventory { Make = "Ford", Model = "F-150", Year = "2015", State = "MN" } );
cars.Add( new Inventory { Make = "Ford", Model = "F-150", Year = "2015", State = "KY" } );
cars.Add( new Inventory { Make = "Ford", Model = "F-150", Year = "2020", State = "FL" } );
cars.Add( new Inventory { Make = "Ford", Model = "F-150", Year = "2020", State = "GA" } );
cars.Add( new Inventory { Make = "Ford", Model = "Ranger", Year = "2010", State = "TN" } );
cars.Add( new Inventory { Make = "Ford", Model = "Ranger", Year = "2010", State = "WY" } );
cars.Add( new Inventory { Make = "Ford", Model = "Ranger", Year = "2012", State = "WY" } );

From cars I would like to create a list of objects with every combination of Make, Model and Year properties, each with a States property that contains a list of all states that have vehicles matching those first three properties:

var vehicles[0]: Make = "Ford", Model = "F-150", Year = "2014", States = { "MT", "AR", "OH" }
var vehicles[1]: Make = "Ford", Model = "F-150", Year = "2015", States = { "AZ", "OR", "MN", "KY" }
var vehicles[2]: Make = "Ford", Model = "F-150", Year = "2020", States = { "FL", "GA" }
var vehicles[3]: Make = "Ford", Model = "Ranger", Year = "2010", States = { "TN", "WY" }
var vehicles[4]: Make = "Ford", Model = "Ranger", Year = "2012", States = { "WY" }

I've looked into using a self-join GroupJoin linq method approach as well as a nested linq query, but I'm stumbling on trying to create the States child collection in the process.

One catch: the dataset that I'm working with contains upwards of 100-million rows (in this example Make/Model/Year/State combinations) with 6 grouping properties rather than 3. As such I'm trying to avoid, if possible, any multi-step processes that involve looping through the data.



Solution 1:[1]

Consider a Dictionary where the key is a Make, Model, Year combination and the vales are a Set of strings (states)

var result = new Dictionary<(string make, string model, string year), HashSet<string>>();

The important thing here is that the dictionary will not allow you to have more than 1 item per make/model/year, and the hash set will not allow you to have more than 1 state. Once you have the right collection types, the population code basically writes itself

foreach (var item in items)
{
    var key = (item.Make, item.Model, item.Year);
    if (!dictionary.TryGetValue(key, out var states))
    {
        states = new HashSet<string>();
        dictionary.Add(key, states);
    }

    states.Add(item.State);
}

Solution 2:[2]

I think we can use lambda GroupBy and string.Join to iterator collection, which might not need to use GroupJoin

cars.GroupBy(x=> new {x.Make,x.Model,x.Year})
    .Select(x=> new {
        x.Key.Make,
        x.Key.Year,
        x.Key.Model,
        State = string.Join(",",x.Select(z=>z.State))
    });

if you want to make it faster we can try to use PLINQ to instead

EDIT

if your dataset is huge data, we can try to use PLINQ help us run linq parallelly as below

cars.AsParallel()
    .GroupBy(x=> new {x.Make,x.Model,x.Year})
    .Select(x=> new {
        x.Key.Make,
        x.Key.Year,
        x.Key.Model,
        State = string.Join(",",x.Select(z=>z.State)).ToList()
    });
    

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2