'Predict memory consumption for rfImpute

I am building a tool making use of rfImpute where I have no control over the data sets the user will ultimately use so chances are that they run into a memory issue as described here: Memory error using rfimpute from randomForest package in R

So I would like to be able to predict the expected memory consumption of rfImpute based on the size of the dataframe which is thrown at rfImpute so I can default to a "cheaper" imputation if required.

The dataframe which has caused problems has the following size:

> object.size(input.rf)
14982789 bytes
> dim(input.rf)
[1] 105415     12

Now if the information in the link above is correct, rfImpute would create an object of size 105415^2 but how to best estimate the required memory size given that the columns could be a wild mix of int, numeric, chr, ...? With the most basic approach assumin 8 bytes per "cell" I'm getting this:

> (105415^2)*8/(1024^3)
[1] 82.79325

But in reality I am exceeding 83GB of RAM usage significantly (in fact I am running out of memory on a machine w/ 256GB of RAM). Also using this estimate I am underestimating the size of input.rf significantly as well:

> 105415*12*8-object.size(input.rf)
-4862949 bytes

And using the actual object size and scaling it by dividing by column count and multiplying by row count produces too low a number either (remember, I am exceeding ~250GB and this is 122GB if I'm not mistaken):

> object.size(input.rf)/12*105415
131617558536.25 bytes

Any suggestions how to get a more realistic estimate of the required memory?

Thanks, Mark



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source