'Predict memory consumption for rfImpute
I am building a tool making use of rfImpute where I have no control over the data sets the user will ultimately use so chances are that they run into a memory issue as described here:
Memory error using rfimpute from randomForest package in R
So I would like to be able to predict the expected memory consumption of rfImpute based on the size of the dataframe which is thrown at rfImpute so I can default to a "cheaper" imputation if required.
The dataframe which has caused problems has the following size:
> object.size(input.rf)
14982789 bytes
> dim(input.rf)
[1] 105415 12
Now if the information in the link above is correct, rfImpute would create an object of size 105415^2 but how to best estimate the required memory size given that the columns could be a wild mix of int, numeric, chr, ...?
With the most basic approach assumin 8 bytes per "cell" I'm getting this:
> (105415^2)*8/(1024^3)
[1] 82.79325
But in reality I am exceeding 83GB of RAM usage significantly (in fact I am running out of memory on a machine w/ 256GB of RAM). Also using this estimate I am underestimating the size of input.rf significantly as well:
> 105415*12*8-object.size(input.rf)
-4862949 bytes
And using the actual object size and scaling it by dividing by column count and multiplying by row count produces too low a number either (remember, I am exceeding ~250GB and this is 122GB if I'm not mistaken):
> object.size(input.rf)/12*105415
131617558536.25 bytes
Any suggestions how to get a more realistic estimate of the required memory?
Thanks, Mark
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
