'r arithmetic on separate dataframes with row titles
I imported some census data from 2 different years and did some clean up. Not all of the variables are numeric but I want to do some arithmetic (difference) then save the result into a new dataframe. Here is some sample data.
df1 <- data.frame(country = c("Brazil", "Columbia", "Hong Kong"), ward_1 = c(35,25,33), ward_2 = c(22,33,44), continent = c("South America", "South America", "Asia"))
df2 <- data.frame(country = c("Brazil", "Hong Kong", "Columbia"), ward_1 = c(45,62,26), ward_2 = c(77,55,67), continent = c("South America", "Asia", "South America"))
Is there a function that can match the entries in the first column then do the arithmetic to columns 2 and 3? Or should the first column be sorted alphabetically before doing any arithmetic?
How do you do arithmetic when there are non-numeric variables in the dataframe?
Solution 1:[1]
Start by matching the countries on both data.frames and then subtract based on that index.
i <- match(df1$country, df2$country)
df1$ward_1 - df2$ward_1[i]
#> [1] -10 -1 -29
df1$ward_2 - df2$ward_2[i]
#> [1] -55 -34 -11
Created on 2022-03-01 by the reprex package (v2.0.1)
To automate the differences, Map or mapply can process several data sets at once.
Map(\(x, y) x - y, df1[2:3], df2[i, 2:3])
#> $ward_1
#> [1] -10 -1 -29
#>
#> $ward_2
#> [1] -55 -34 -11
Created on 2022-03-01 by the reprex package (v2.0.1)
And this can be coerced to data.frame.
as.data.frame(Map(\(x, y) x - y, df1[2:3], df2[i, 2:3]))
#> ward_1 ward_2
#> 1 -10 -55
#> 2 -1 -34
#> 3 -29 -11
Created on 2022-03-01 by the reprex package (v2.0.1)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Rui Barradas |
