'Slow population of a dataframe
I'm still very new to R and am noticing a very slow load time for the population of a dataframe
For my dataset I'm wanting to load the dataframe per row in the dataset, based on the value in column $population
It should end up with around 700,000 rows but after 10 minutes processing it's only loaded about 77,000 which appears really really slow
Code as per below
df <- data.frame(Ints=integer())
for(i in 1:nrow(popDemo)) {
row <- popDemo[i,]
# Use a while value to loop
j <- 1
while (j <= row$population) {
df[nrow(df) + 1,] <- row$age
j = j+1
}
}
Any guidance greatly appreciated
Thanks
Solution 1:[1]
Starting with a simple popDemo,
popDemo <- data.frame(population=c(3,5), age=c(1,10))
Your code produces
df <- data.frame(Ints=integer())
for (i in 1:nrow(popDemo)) {
row <- popDemo[i,]
# Use a while value to loop
j <- 1
while (j <= row$population) {
df[nrow(df) + 1,] <- row$age
j = j+1
}
}
df
# Ints
# 1 1
# 2 1
# 3 1
# 4 10
# 5 10
# 6 10
# 7 10
# 8 10
This can be done much faster in one step:
data.frame(Ints = rep(popDemo$age, times = popDemo$population))
# Ints
# 1 1
# 2 1
# 3 1
# 4 10
# 5 10
# 6 10
# 7 10
# 8 10
If by chance you have more columns, and you're hoping to just repeat them, an alternative implementation that is not just one column.
popDemo <- data.frame(population=c(3,5), age=c(1,10), ltr=c("a","b"))
popDemo[ rep(seq_len(nrow(popDemo)), times = popDemo$population), ]
# population age ltr
# 1 3 1 a
# 1.1 3 1 a
# 1.2 3 1 a
# 2 5 10 b
# 2.1 5 10 b
# 2.2 5 10 b
# 2.3 5 10 b
# 2.4 5 10 b
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | r2evans |
