'Slow population of a dataframe

I'm still very new to R and am noticing a very slow load time for the population of a dataframe

For my dataset I'm wanting to load the dataframe per row in the dataset, based on the value in column $population

It should end up with around 700,000 rows but after 10 minutes processing it's only loaded about 77,000 which appears really really slow

Code as per below

df <- data.frame(Ints=integer())

for(i in 1:nrow(popDemo)) {
    row <- popDemo[i,]       
        
    # Use a while value to loop
    j <- 1
    while (j <= row$population) {
       df[nrow(df) + 1,] <- row$age
       j = j+1
      
}
}  

Any guidance greatly appreciated

Thanks

r


Solution 1:[1]

Starting with a simple popDemo,

popDemo <- data.frame(population=c(3,5), age=c(1,10))

Your code produces

df <- data.frame(Ints=integer())
for (i in 1:nrow(popDemo)) {
  row <- popDemo[i,]       
  # Use a while value to loop
  j <- 1
  while (j <= row$population) {
    df[nrow(df) + 1,] <- row$age
    j = j+1
  }
}
df
#   Ints
# 1    1
# 2    1
# 3    1
# 4   10
# 5   10
# 6   10
# 7   10
# 8   10

This can be done much faster in one step:

data.frame(Ints = rep(popDemo$age, times = popDemo$population))
#   Ints
# 1    1
# 2    1
# 3    1
# 4   10
# 5   10
# 6   10
# 7   10
# 8   10

If by chance you have more columns, and you're hoping to just repeat them, an alternative implementation that is not just one column.

popDemo <- data.frame(population=c(3,5), age=c(1,10), ltr=c("a","b"))
popDemo[ rep(seq_len(nrow(popDemo)), times = popDemo$population), ]
#     population age ltr
# 1            3   1   a
# 1.1          3   1   a
# 1.2          3   1   a
# 2            5  10   b
# 2.1          5  10   b
# 2.2          5  10   b
# 2.3          5  10   b
# 2.4          5  10   b

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 r2evans