'How to group values in a column (R)

I'm creating a summary table that groups my records by country of destination:

SummarybyLocation <- PSTNRecords %>% 
                           group_by(Destination) %>%
                                summarize(
                                  Calls = n(), 
                                  Minutes = sum(durationMinutes), 
                                  MaxDuration = max(durationMinutes),
                                  AverageDuration = mean(durationMinutes), 
                                  Charges = sum(charge),
                                  Fees = sum(connectionCharge)
                                )

SummarybyLocation

The resulting table is as follows:

Summarized table

I realized that the Destination values are inconsistent (for example, "France" and "FR" both refer to the same area, and then I have a "North America" that I presume gathers USA and Canada.

I was wondering if there's a way of creating custom groups for these values, so that the aggregation would make more sense. I tried to use the countrycode package to add an iso2c column, but that doesn't resolve the problem of managing other area aggregations like "North America".

I would really appreciate some suggestions on how to handle this.

Thanks in advance!



Solution 1:[1]

The {countrycode} package can deal with custom names/codes easily...

library(tidyverse)
library(countrycode)

PSTNRecords <- tibble::tribble(
  ~Destination,    ~durationMinutes, ~charge, ~connectionCharge,
  "FR",            1,                2.5,     0.3,
  "France",        1,                2.5,     0.3,
  "United States", 1,                2.5,     0.3,
  "USA",           1,                2.5,     0.3,
  "North America", 1,                2.5,     0.3
)

# see what special codes/country names you have to deal with
iso3cs <- countrycode(PSTNRecords$Destination, "country.name", "iso3c", warn = FALSE)
unique(PSTNRecords$Destination[is.na(iso3cs)])
#> [1] "FR"            "North America"

# decde how to deal with them
custom_matches <- c("FR" = "FRA", "North America" = "USA")

# use your custom codes
PSTNRecords %>%
  mutate(iso3c = countrycode(Destination, "country.name", "iso3c", custom_match = custom_matches))
#> # A tibble: 5 × 5
#>   Destination   durationMinutes charge connectionCharge iso3c
#>   <chr>                   <dbl>  <dbl>            <dbl> <chr>
#> 1 FR                          1    2.5              0.3 FRA  
#> 2 France                      1    2.5              0.3 FRA  
#> 3 United States               1    2.5              0.3 USA  
#> 4 USA                         1    2.5              0.3 USA  
#> 5 North America               1    2.5              0.3 USA

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 CJ Yetman