'NA's Produced When Number of Iterations Increase
I am trying to write a function that generates random paths for a Travelling Salesman Problem.
Here is the data:
final_data = data.frame(Longitude = rnorm(19, 40,1), Lattitude = rnorm(19, -65,1))
final_data$id = 1:19
N <- nrow(final_data)
dists <- outer(seq_len(N), seq_len(N), function(a,b) {
geosphere::distHaversine(final_data[a,1:2], final_data[b,1:2]) })
D <- as.matrix(dists)
####
d = 19
fix_num <- 1
Here is the function for 100 random paths:
results <- list()
resultss <- list()
for (i in 1:100)
{
start.time <- Sys.time()
iteration = i
relations_i = tibble(
from = c(fix_num, sample(setdiff(1:d, fix_num))),
to = lead(from, default=from[1]),
)
relations_i = data.frame(relations_i)
relations_i$iteration = i
m<-dists
my_sums_i = function(relations_i, m) {
r = as.matrix(relations_i)
if(mode(r) != "numeric") mode(r) = "numeric"
sum(m[r])}
m_i = my_sums_i(relations_i, m)
end.time <- Sys.time()
time.taken_i <- end.time - start.time
m_i$time = time.taken_i
results[[i]] <- m_i
resultss[[i]] <- relations_i
}
results_1 <- data.frame(do.call(rbind.data.frame, results))
results_2 <- data.frame(do.call(rbind.data.frame, resultss))
####
This seems to work fine - the problem happens when I Increase the number of paths (e.g. i =100 vs. i = 1000):
results <- list()
resultss <- list()
for (i in 1:1000)
{
start.time <- Sys.time()
iteration = i
relations_i = tibble(
from = c(fix_num, sample(setdiff(1:d, fix_num))),
to = lead(from, default=from[1]),
)
relations_i = data.frame(relations_i)
relations_i$iteration = i
m<-dists
my_sums_i = function(relations_i, m) {
r = as.matrix(relations_i)
if(mode(r) != "numeric") mode(r) = "numeric"
sum(m[r])}
m_i = my_sums_i(relations_i, m)
end.time <- Sys.time()
time.taken_i <- end.time - start.time
m_i$time = time.taken_i
results[[i]] <- m_i
resultss[[i]] <- relations_i
}
results_1 <- data.frame(do.call(rbind.data.frame, results))
results_2 <- data.frame(do.call(rbind.data.frame, resultss))
For some reason, when there are more paths, NA's are produced in the results.
Is there a way to fix this?
Thank you!
Solution 1:[1]
If you want to locate a bug, don't loop through your code 100 times all at once, but go through each line once, manually. After every step, check the resulting values.
Looking at the subroutine my_sums_i(), you're giving dataframe relations_i and matrix dists as arguments. The df relations_i has 3 columns: from, to, and iteration. It seems you want to lookup the distance between from and to with sum(m[r]). However, you also still have the values from column iteration in r. The matrix dists (aka m) only has 19*19 = 361 values, and your code fails exactly after iteration 361.
The following fix is enough to solve your issue:
my_sums_i <- function(relations_i, dists) {
r = as.matrix( relations_i[c('to','from')] )
sum( dists[r] )
}
You don't have to redefine this function in every loop iteration. Just place it outside your loop. Functions also protect your outside-of-function variables from change, so you don't have to keep redefining them (read more).
I've uncluttered your code a bit before I was able to understand it. I'll paste the full working code here to study:
library("tidyverse")
N <- 19
final_data <- data.frame(longitude = rnorm(N, 40,1), latitude = rnorm(N, -65,1), id = seq_len(N))
dists <- outer(seq_len(N), seq_len(N), function(a,b) {
geosphere::distHaversine(final_data[a,1:2], final_data[b,1:2]) })
fix_num <- 1
results <- list()
resultss <- list()
my_sums_i <- function(relations_i, dists) {
r = as.matrix(relations_i[c('to','from')])
sum( dists[r] )
}
for (i in 1:1000) {
start.time <- Sys.time()
relations_i <- tibble(
from = c(fix_num, sample(setdiff(seq_len(N), fix_num))),
to = lead(from, default=from[1]),
iteration = i
)
end.time <- Sys.time()
m_i <- list()
m_i[['distance']] = my_sums_i(relations_i, dists)
m_i[['time']] <- end.time - start.time
results[[i]] <- m_i
resultss[[i]] <- relations_i
}
results_1 <- data.frame(do.call(rbind.data.frame, results))
results_2 <- data.frame(do.call(rbind.data.frame, resultss))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Caspar V. |
