'Mathematica equivalent of Pandas groupby and sum
I imported a World Health Organization (WHO) csv file with Covid-19 cases per country from January 2020 into Mathematica. The file is a table with "Date Reported", "Country Code", "Country Name", "WHO Region", "New Cases", "Cumulative Cases", "New Deaths", "Cumulative Deaths" as columns (see image).
I want to group the countries by the same "WHO Region" and sum the "New Cases", "Cumulative Cases", "New Deaths" and "Cumulative Deaths" that have the same "Date Reported" (for example sum all the cumulative cases in Europe in each day). I can easily do that in Python using the Pandas library with the functions groupby() and sum(), like in the code below:
import pandas as pd
worldCases = pd.read_csv("WHO-COVID-19-global-data_02April2022.csv",usecols=[0,2,3,4,5,6,7])
regionCases = worldCases.groupby(['WHO_region','Date_reported']).sum()
How could I achieve the same goal in Mathematica?
The csv file can be downloaded from here: https://dl.dropboxusercontent.com/s/80ytphej636239x/WHO-COVID-19-global-data_02April2022.csv?dl=0
Solution 1:[1]
Tabulating for each date with a basic method first
data = Import[
"https://dl.dropboxusercontent.com/s/80ytphej636239x/WHO-COVID-19-global-data_02April2022.csv"];
regions = Union[Rest[data[[All, 4]]]];
splitout = Cases[data, {_, _, _, #, __}] & /@ regions;
dates = Union[Rest[data[[All, 1]]]];
eachdate[onedate_] := Module[{},
splitout = Cases[data,
{onedate, _, _, #, __}] & /@ regions;
countup[onesplit_] := Module[{totals},
totals = Total[onesplit[[All, {5, 6, 7, 8}]]];
NumberForm[#, DigitBlock -> 3] & /@ totals];
Labeled[
TableForm[countup /@ splitout, TableAlignments -> Right,
TableHeadings -> {regions, data[[1, {5, 6, 7, 8}]]}],
onedate, {{Top, Left}}]]
(* Just the last three dates *)
eachdate /@ Take[dates, -3]
-tables not shown-
Plotting over the dates
eachdate[onedate_] := Module[{},
splitout = Cases[data,
{onedate, _, _, #, __}] & /@ regions;
totals = Total[#[[All, {5, 6, 7, 8}]]] & /@ splitout;
MapAt[{onedate, #} &, totals, {All, All}]]
set = eachdate /@ dates;
newCases = set[[All, All, 1]];
cumulativeCases = set[[All, All, 2]];
newDeaths = set[[All, All, 3]];
cumulativeDeaths = set[[All, All, 4]];
DateListPlot[Transpose[newCases],
PlotLegends -> regions, PlotRange -> All,
PlotLabel -> data[[1, 5]]]
DateListPlot[Transpose[cumulativeCases],
PlotLegends -> regions, PlotRange -> All,
PlotLabel -> data[[1, 6]]]
DateListPlot[Transpose[newDeaths],
PlotLegends -> regions, PlotRange -> All,
PlotLabel -> data[[1, 7]]]
DateListPlot[Transpose[cumulativeDeaths],
PlotLegends -> regions, PlotRange -> All,
PlotLabel -> data[[1, 8]]]
Equivalent tabulation procedure using a Dataset
headers = First[data];
dataset = Dataset[Association[
Thread[headers -> #]] & /@ Rest[data]];
cols = Take[headers, -4];
datatable = dataset[
GroupBy["WHO_region"], GroupBy["Date_reported"],
Total, cols];
sampledate = Last[dates];
Labeled[TableForm[Outer[
datatable[#1][sampledate][#2] &, regions, cols] /.
x_Integer :> NumberForm[x, DigitBlock -> 3],
TableAlignments -> Right, TableHeadings -> {regions, cols}],
sampledate, {{Top, Left}}]
2022-03-30 New_cases Cumulative_cases New_deaths Cumulative_deaths ------------------------------------------------------------------- AFRO 2,812 8,580,381 24 171,086 AMRO 105,798 150,709,372 1,785 2,700,130 EMRO 3,789 21,576,432 62 340,628 EURO 718,203 202,693,049 1,450 1,940,630 Other 0 764 0 13 SEARO 33,699 57,130,457 220 778,986 WPRO 527,512 46,071,142 826 211,262
And a sample chart
DateListPlot[Map[Function[region,
{#, datatable[region][#][cols[[1]]]} & /@ dates], regions],
PlotLegends -> regions, PlotRange -> All, PlotLabel -> cols[[1]]]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |




