'get minimum value difference and case in data.table R
I have a data.table in R that looks like this:
State City City.Population num.stores
state A City_1 523 5
state A City_2 456 NA
state A City_3 1230 52
state A City_4 780 NA
state B City_5 788 NA
state B City_6 111 15
state B City_7 897 NA
state B City_8 5 48
and I would like to get for each City another city's num.stores by comparing another city's population within a state. For example in City_1 from State A I would get that City 2 from State A would be more similar in population than City 3, because the difference between their populations is 67 (between City_1 and City_2 both from state A) compared to 707 (City_1 vs City_3), therefore assigning 5 stores to City_2. My end result would look like this:
State City City.Population num.stores assigned.stores similar_pop_city
state A City_1 523 5 5
state A City_2 456 NA 5 City_1 (City_1 is closer in population and not null)
state A City_3 1230 52 52
state A City_4 780 NA 52 City_1 (City_1 is closer in population and not null)
state B City_5 788 NA 15 City_6 (City_6 is closer in population and not null)
state B City_6 111 15 15
state B City_7 897 NA 15 City_6 (City_6 is closer in population and not null)
state B City_8 5 48 5
I tried to do something as following but I'm still missing the correct logic to execute it:
d.f[, which.min(City.Population -City.Population), .(State)]
but is returning only 1's.
P.D I should also exclude the same City when comparing against the other cities within the state
Solution 1:[1]
I'm not sure if I got the wrong output or if your desired output still has some errors in it (for example, for City_4 you assigned City_4 (itself) as the Closest.City).
I started by assigning Closest.City of City not having NA in num.stores to City containing NA in num.stores. I afterwards created the column assigned.stores, copying num.stores from City with known num.stores, and otherwise assigning num.stores of the Closest.City.
Data
State <- c(rep("State A", 4), rep("State B", 4))
City <- c("City_1", "City_2", "City_3", "City_4", "City_5", "City_6", "City_7", "City_8")
City.Population <- c(523, 456, 1230, 780, 788, 111, 897, 5)
num.stores <- c(5, NA, 52, NA, NA, 15, NA, 48)
d.f <- data.frame(State, City, City.Population, num.stores)
Code
d.f %>%
group_by(State) %>%
mutate(Closest.City = ifelse(is.na(num.stores),
apply(sapply(City.Population[!is.na(num.stores)], function(i) abs(i - City.Population[!is.na(num.stores)])), 2,
function(n) City[which(n == sort(n)[2])]), NA)) %>%
mutate(assigned.stores = ifelse(is.na(num.stores), num.stores[match(Closest.City, City)], num.stores))
Output
# A tibble: 8 x 6
# Groups: State [2]
State City City.Population num.stores Closest.City assigned.stores
<chr> <chr> <dbl> <dbl> <chr> <dbl>
1 State A City_1 523 5 NA 5
2 State A City_2 456 NA City_1 5
3 State A City_3 1230 52 NA 52
4 State A City_4 780 NA City_1 5
5 State B City_5 788 NA City_6 15
6 State B City_6 111 15 NA 15
7 State B City_7 897 NA City_6 15
8 State B City_8 5 48 NA 48
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Gnueghoidune |
