'r - Select a numerical value in a Matrix which is based of another string dataset (Distance between two countries)
I have a dataset(1) of investments where the target and the host country are listed. In addition, I have a matrix(2) which shows the distance between all countries. Right now I would like to add a column in the first data set which contains the distance between the target and home country in each row.
The first data set looks like this (with values below):
targetC year Comp_id homeC sales assets profit Distance_Target_Home (this column would be the goal)
ABW 2008 AL8234 ALB 74839 75342 976857 8543
and the second (with the distance in between the countries):
ABW AFG AGO ALB ANT
ABW 3455 2456 8543 1342
AFG
AGO
ALB
ANT
Thanks a lot
Solution 1:[1]
Assuming that df_distances is your second dataframe (with distances), you can reshape it to long format like this:
## for testing you can use this minimal distance dataframe:
df_distances <- structure(list(ABW = c(10L, 7L), AFG = c(9L, 5L)), class = "data.frame", row.names = c("ABW", "AFG"))
## > df_distances
##
## ABW AFG
## ABW 9 10
## AFG 3 4
... now reshape (stack) df_distances to long format so that homeC and targetC receive their own column each:
library(tidyr)
library(dplyr)
df_distances <- df_distances %>%
rownames_to_column('targetC') %>%
pivot_longer(cols = -1,
names_to = 'homeC',
values_to = 'Distance_Target_Home'
)
## > df_distances
## # A tibble: 4 x 3
## targetC homeC Distance_Target_Home
## <chr> <chr> <int>
## 1 ABW ABW 9
## 2 ABW AFG 10
## 3 AFG ABW 3
## 4 AFG AFG 4
... join with df1:
df1 %>% left_join(df_distances)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | I_O |
