'Grouping over multiple items in List<Dictionary<string, string>>

I have a list of dictionaries like below:

var result = new List<Dictionary<string, string>>
    {
        new Dictionary<string, string> { {"Continent", "Europe"}, {"Country", "Italy"}, {"Population", "10"} },
        new Dictionary<string, string> { {"Continent", "Africa"}, {"Country", "RSA"}, {"Population", "20"} },
        new Dictionary<string, string> { {"Continent", "Africa"}, {"Country", "RSA"}, {"Population", "30"} },
        new Dictionary<string, string> { {"Continent", "Africa"}, {"Country", "Nigeria"}, {"Population", "40"} },
    };

I would like to group by CONTINENT first, then group by COUNTRY and then within the result calculate a sum of POPULATION field, so in the example above:

  • GROUP_1: Europe, Italy => Sum: 10
  • GROUP_2: Africa, RSA => Sum: 20 + 30 = 50
  • GROUP_3: Africa, Nigeria=> Sum: 40

I can do this with nested grouping, like below:

var resultGroupedByContinent = result.GroupBy(o => o["Continent"]);

foreach (var continent in resultGroupedByContinent )
{
   var resultPerCountry = resultGroupedByContinent.GroupBy(o => o["Country"]);

   foreach(country in resultPerCountry)
   {
      CalcualteSum
   }
}

Is there any better way of implementing this, making better use of grouping, LINQ ?

In the real case scenario I will be dealing with:

  • a list that has a total number of few million items (result.Count()=few million)
  • each dictionary 70 key-values
  • 4 level of grouping (Continent -> Country -> 3rd level -> 4th level)

I just want to add that List<Dictionary<string, string>> can be replaced with a different data structure, basically as a source I am getting following JSON, and then I am using json deserialiser:

{
  "header": [ "Continent", "Country", "Population" ],
  "errors": [],
  "rows": [
    [ "Europe", "Italy", "10" ],
    [ "Africa", "RSA", "20" ],
    [ "Africa", "RSA", "30" ],
    [ "Africa", "Nigeria", "40" ]]
}

I will be getting such JSON with few million rows every 8 seconds, so that my cap for calculation time.



Solution 1:[1]

If you have this Dictionary<string, string> only, I would convert them to an defined object first and then group by like this

    public class MyObject
    {
        public MyObject(Dictionary<string, string> dict)
        {
            Country = dict["Country"];
            Continent = dict["Continent"];
            Population = int.Parse(dict["Population"]);
        }
        public string Country { get; set; }
        public string Continent { get; set; }
        public int Population { get; set; }
    }

--------------------------------------------------

            var result = new List<Dictionary<string, string>>
            {
                new Dictionary<string, string> { {"Continent", "Europe"}, {"Country", "Italy"}, {"Population", "10"} },
                new Dictionary<string, string> { {"Continent", "Africa"}, {"Country", "RSA"}, {"Population", "20"} },
                new Dictionary<string, string> { {"Continent", "Africa"}, {"Country", "RSA"}, {"Population", "30"} },
                new Dictionary<string, string> { {"Continent", "Africa"}, {"Country", "Nigeria"}, {"Population", "40"} },
            };
            var mapped = result.Select(x => new MyObject(x)).ToList();
            var sum = mapped.GroupBy(x => new { x.Continent, x.Country }).ToDictionary(x => x.Key, x => x.Sum(y => y.Population));

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Demetrius Axenowski