'r - Select a numerical value in a Matrix which is based of another string dataset (Distance between two countries)

I have a dataset(1) of investments where the target and the host country are listed. In addition, I have a matrix(2) which shows the distance between all countries. Right now I would like to add a column in the first data set which contains the distance between the target and home country in each row.

The first data set looks like this (with values below):

targetC year Comp_id homeC sales assets profit Distance_Target_Home (this column would be the goal)
ABW     2008 AL8234  ALB   74839 75342  976857 8543

and the second (with the distance in between the countries):

   ABW AFG   AGO  ALB  ANT 
ABW    3455  2456 8543 1342
AFG
AGO
ALB
ANT

Thanks a lot



Solution 1:[1]

Assuming that df_distances is your second dataframe (with distances), you can reshape it to long format like this:

## for testing you can use this minimal distance dataframe:
df_distances <- structure(list(ABW = c(10L, 7L), AFG = c(9L, 5L)), class = "data.frame", row.names = c("ABW", "AFG"))


## > df_distances
##
##     ABW AFG
## ABW   9  10
## AFG   3   4

... now reshape (stack) df_distances to long format so that homeC and targetC receive their own column each:

library(tidyr)
library(dplyr)

df_distances <- df_distances %>% 
  rownames_to_column('targetC') %>%
  pivot_longer(cols = -1,
               names_to = 'homeC',
               values_to = 'Distance_Target_Home'
               )

## > df_distances
## # A tibble: 4 x 3
##   targetC homeC Distance_Target_Home
##   <chr>   <chr>                <int>
## 1 ABW     ABW                      9
## 2 ABW     AFG                     10
## 3 AFG     ABW                      3
## 4 AFG     AFG                      4

... join with df1:

df1 %>%  left_join(df_distances)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 I_O