'get minimum value difference and case in data.table R

I have a data.table in R that looks like this:

  State     City City.Population  num.stores
state A   City_1             523           5
state A   City_2             456          NA
state A   City_3            1230          52 
state A   City_4             780          NA
state B   City_5             788          NA
state B   City_6             111          15
state B   City_7             897          NA 
state B   City_8               5          48

and I would like to get for each City another city's num.stores by comparing another city's population within a state. For example in City_1 from State A I would get that City 2 from State A would be more similar in population than City 3, because the difference between their populations is 67 (between City_1 and City_2 both from state A) compared to 707 (City_1 vs City_3), therefore assigning 5 stores to City_2. My end result would look like this:

  State     City City.Population  num.stores  assigned.stores  similar_pop_city
state A   City_1             523           5                5
state A   City_2             456          NA                5            City_1  (City_1 is closer in population and not null)
state A   City_3            1230          52               52 
state A   City_4             780          NA               52            City_1  (City_1 is closer in population and not null)
state B   City_5             788          NA               15            City_6  (City_6 is closer in population and not null)
state B   City_6             111          15               15
state B   City_7             897          NA               15            City_6  (City_6 is closer in population and not null)
state B   City_8               5          48                5

I tried to do something as following but I'm still missing the correct logic to execute it:

d.f[, which.min(City.Population -City.Population), .(State)]

but is returning only 1's.

P.D I should also exclude the same City when comparing against the other cities within the state

r data.table

Solution 1:^[1]

I'm not sure if I got the wrong output or if your desired output still has some errors in it (for example, for City_4 you assigned City_4 (itself) as the Closest.City).

I started by assigning Closest.City of City not having NA in num.stores to City containing NA in num.stores. I afterwards created the column assigned.stores, copying num.stores from City with known num.stores, and otherwise assigning num.stores of the Closest.City.

Data

State <- c(rep("State A", 4), rep("State B", 4))
City <- c("City_1", "City_2", "City_3", "City_4", "City_5", "City_6", "City_7", "City_8")
City.Population <- c(523, 456, 1230, 780, 788, 111, 897, 5)
num.stores <- c(5, NA, 52, NA, NA, 15, NA, 48)
d.f <- data.frame(State, City, City.Population, num.stores)

Code

d.f %>%
  group_by(State) %>%
  mutate(Closest.City = ifelse(is.na(num.stores),
     apply(sapply(City.Population[!is.na(num.stores)], function(i) abs(i - City.Population[!is.na(num.stores)])), 2, 
                              function(n) City[which(n == sort(n)[2])]), NA)) %>%
  mutate(assigned.stores = ifelse(is.na(num.stores), num.stores[match(Closest.City, City)], num.stores))

Output

# A tibble: 8 x 6
# Groups:   State [2]
  State   City   City.Population num.stores Closest.City assigned.stores
  <chr>   <chr>            <dbl>      <dbl> <chr>                  <dbl>
1 State A City_1             523          5 NA                         5
2 State A City_2             456         NA City_1                     5
3 State A City_3            1230         52 NA                        52
4 State A City_4             780         NA City_1                     5
5 State B City_5             788         NA City_6                    15
6 State B City_6             111         15 NA                        15
7 State B City_7             897         NA City_6                    15
8 State B City_8               5         48 NA                        48

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Gnueghoidune

'get minimum value difference and case in data.table R

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]