'NA's Produced When Number of Iterations Increase

I am trying to write a function that generates random paths for a Travelling Salesman Problem.

Here is the data:

final_data = data.frame(Longitude = rnorm(19, 40,1), Lattitude = rnorm(19, -65,1))

 final_data$id = 1:19


N <- nrow(final_data)

dists <- outer(seq_len(N), seq_len(N), function(a,b) {
    geosphere::distHaversine(final_data[a,1:2], final_data[b,1:2]) })

D <- as.matrix(dists)

####


d = 19
fix_num <- 1

Here is the function for 100 random paths:

results <- list()
resultss <- list()


for (i in 1:100)

{
start.time <- Sys.time()



iteration = i

relations_i = tibble(
  from = c(fix_num, sample(setdiff(1:d, fix_num))),
  to = lead(from, default=from[1]),
)

relations_i = data.frame(relations_i)
relations_i$iteration = i

m<-dists

my_sums_i = function(relations_i, m) {
  r = as.matrix(relations_i)
  if(mode(r) != "numeric") mode(r) = "numeric"
  sum(m[r])}

m_i = my_sums_i(relations_i, m)

end.time <- Sys.time()
time.taken_i <- end.time - start.time

m_i$time = time.taken_i

 results[[i]] <- m_i

 resultss[[i]] <- relations_i

}

results_1 <- data.frame(do.call(rbind.data.frame, results))
results_2 <- data.frame(do.call(rbind.data.frame, resultss))


####

This seems to work fine - the problem happens when I Increase the number of paths (e.g. i =100 vs. i = 1000):

results <- list()
resultss <- list()


for (i in 1:1000)

{
start.time <- Sys.time()



iteration = i

relations_i = tibble(
  from = c(fix_num, sample(setdiff(1:d, fix_num))),
  to = lead(from, default=from[1]),
)

relations_i = data.frame(relations_i)
relations_i$iteration = i

m<-dists

my_sums_i = function(relations_i, m) {
  r = as.matrix(relations_i)
  if(mode(r) != "numeric") mode(r) = "numeric"
  sum(m[r])}

m_i = my_sums_i(relations_i, m)

end.time <- Sys.time()
time.taken_i <- end.time - start.time

m_i$time = time.taken_i

 results[[i]] <- m_i

 resultss[[i]] <- relations_i

}

results_1 <- data.frame(do.call(rbind.data.frame, results))
results_2 <- data.frame(do.call(rbind.data.frame, resultss))
  • For some reason, when there are more paths, NA's are produced in the results.

  • Is there a way to fix this?

Thank you!

r


Solution 1:[1]

If you want to locate a bug, don't loop through your code 100 times all at once, but go through each line once, manually. After every step, check the resulting values.

Looking at the subroutine my_sums_i(), you're giving dataframe relations_i and matrix dists as arguments. The df relations_i has 3 columns: from, to, and iteration. It seems you want to lookup the distance between from and to with sum(m[r]). However, you also still have the values from column iteration in r. The matrix dists (aka m) only has 19*19 = 361 values, and your code fails exactly after iteration 361.

The following fix is enough to solve your issue:

my_sums_i <- function(relations_i, dists) {
    r = as.matrix( relations_i[c('to','from')] )
    sum( dists[r] )
}

You don't have to redefine this function in every loop iteration. Just place it outside your loop. Functions also protect your outside-of-function variables from change, so you don't have to keep redefining them (read more).

I've uncluttered your code a bit before I was able to understand it. I'll paste the full working code here to study:

library("tidyverse")

N <- 19
final_data <- data.frame(longitude = rnorm(N, 40,1), latitude = rnorm(N, -65,1), id = seq_len(N))

dists <- outer(seq_len(N), seq_len(N), function(a,b) {
geosphere::distHaversine(final_data[a,1:2], final_data[b,1:2]) })

fix_num <- 1

results <- list()
resultss <- list()

my_sums_i <- function(relations_i, dists) {
    r = as.matrix(relations_i[c('to','from')])
    sum( dists[r] )
}


for (i in 1:1000) {
    
    start.time <- Sys.time()

    relations_i <- tibble(
        from = c(fix_num, sample(setdiff(seq_len(N), fix_num))),
        to = lead(from, default=from[1]),
        iteration = i
    )

    end.time <- Sys.time()

    m_i <- list()
    m_i[['distance']] = my_sums_i(relations_i, dists)
    m_i[['time']] <- end.time - start.time

    results[[i]] <- m_i
    resultss[[i]] <- relations_i

}

results_1 <- data.frame(do.call(rbind.data.frame, results))
results_2 <- data.frame(do.call(rbind.data.frame, resultss))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Caspar V.