'Mathematica equivalent of Pandas groupby and sum

I imported a World Health Organization (WHO) csv file with Covid-19 cases per country from January 2020 into Mathematica. The file is a table with "Date Reported", "Country Code", "Country Name", "WHO Region", "New Cases", "Cumulative Cases", "New Deaths", "Cumulative Deaths" as columns (see image).

I want to group the countries by the same "WHO Region" and sum the "New Cases", "Cumulative Cases", "New Deaths" and "Cumulative Deaths" that have the same "Date Reported" (for example sum all the cumulative cases in Europe in each day). I can easily do that in Python using the Pandas library with the functions groupby() and sum(), like in the code below:

import pandas as pd
worldCases = pd.read_csv("WHO-COVID-19-global-data_02April2022.csv",usecols=[0,2,3,4,5,6,7])
regionCases = worldCases.groupby(['WHO_region','Date_reported']).sum() 

How could I achieve the same goal in Mathematica?

The csv file can be downloaded from here: https://dl.dropboxusercontent.com/s/80ytphej636239x/WHO-COVID-19-global-data_02April2022.csv?dl=0



Solution 1:[1]

Tabulating for each date with a basic method first

data = Import[
   "https://dl.dropboxusercontent.com/s/80ytphej636239x/WHO-COVID-19-global-data_02April2022.csv"];

regions = Union[Rest[data[[All, 4]]]];

splitout = Cases[data, {_, _, _, #, __}] & /@ regions;

dates = Union[Rest[data[[All, 1]]]];

eachdate[onedate_] := Module[{},
  splitout = Cases[data,
     {onedate, _, _, #, __}] & /@ regions;

  countup[onesplit_] := Module[{totals},
    totals = Total[onesplit[[All, {5, 6, 7, 8}]]];
    NumberForm[#, DigitBlock -> 3] & /@ totals];

  Labeled[
   TableForm[countup /@ splitout, TableAlignments -> Right,
    TableHeadings -> {regions, data[[1, {5, 6, 7, 8}]]}],
   onedate, {{Top, Left}}]]

(* Just the last three dates *)
eachdate /@ Take[dates, -3]

-tables not shown-

Plotting over the dates

eachdate[onedate_] := Module[{},
  splitout = Cases[data,
     {onedate, _, _, #, __}] & /@ regions;
  totals = Total[#[[All, {5, 6, 7, 8}]]] & /@ splitout;
  MapAt[{onedate, #} &, totals, {All, All}]]

set = eachdate /@ dates;

newCases = set[[All, All, 1]];
cumulativeCases = set[[All, All, 2]];
newDeaths = set[[All, All, 3]];
cumulativeDeaths = set[[All, All, 4]];

DateListPlot[Transpose[newCases],
 PlotLegends -> regions, PlotRange -> All,
 PlotLabel -> data[[1, 5]]]

enter image description here

DateListPlot[Transpose[cumulativeCases],
 PlotLegends -> regions, PlotRange -> All,
 PlotLabel -> data[[1, 6]]]

enter image description here

DateListPlot[Transpose[newDeaths],
 PlotLegends -> regions, PlotRange -> All,
 PlotLabel -> data[[1, 7]]]

enter image description here

DateListPlot[Transpose[cumulativeDeaths],
 PlotLegends -> regions, PlotRange -> All,
 PlotLabel -> data[[1, 8]]]

enter image description here

Equivalent tabulation procedure using a Dataset

headers = First[data];

dataset = Dataset[Association[
      Thread[headers -> #]] & /@ Rest[data]];

cols = Take[headers, -4];

datatable = dataset[
   GroupBy["WHO_region"], GroupBy["Date_reported"],
   Total, cols];

sampledate = Last[dates];

Labeled[TableForm[Outer[
    datatable[#1][sampledate][#2] &, regions, cols] /. 
   x_Integer :> NumberForm[x, DigitBlock -> 3],
  TableAlignments -> Right, TableHeadings -> {regions, cols}],
 sampledate, {{Top, Left}}]
2022-03-30
         New_cases  Cumulative_cases  New_deaths  Cumulative_deaths
-------------------------------------------------------------------
 AFRO        2,812         8,580,381          24            171,086
 AMRO      105,798       150,709,372       1,785          2,700,130
 EMRO        3,789        21,576,432          62            340,628
 EURO      718,203       202,693,049       1,450          1,940,630
Other            0               764           0                 13
SEARO       33,699        57,130,457         220            778,986
 WPRO      527,512        46,071,142         826            211,262

And a sample chart

DateListPlot[Map[Function[region,
   {#, datatable[region][#][cols[[1]]]} & /@ dates], regions],
 PlotLegends -> regions, PlotRange -> All, PlotLabel -> cols[[1]]]

enter image description here

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1