'R function taking too long to run in python, can I optimize it?

I'm working with some data analysis with python. Still, at some point, I need to transform the data using an expectation maximization algorithm to impute values to zeros with an R package.

I start my code by importing all the libraries I need:

import numpy as np
import pandas as pd

# Importing and installing the R module zCompositions
from rpy2 import robjects
import rpy2.robjects.packages as rpackages
utils = rpackages.importr('utils')
utils.chooseCRANmirror(ind=1)
utils.install_packages("zCompositions")
zCompositions = rpackages.importr("zCompositions")

# Importing and installing pandas to R, to convert the pd.DataFrame object to
# R matrices and vectors
from rpy2.robjects import pandas2ri
import rpy2.robjects.numpy2ri
rpy2.robjects.numpy2ri.activate()
pandas2ri.activate()

Then I import my data and manipulate my df to get a (452, 50) df and a (452, 50) df called LOD. Since I couldn't run the R function on a mock-up dataframe, I linked the full df and LOD dataframes in my github.

And run the function that I want from the R package:

X_lrEM = zCompositions.lrEM(df, label=0, dl=LOD, ini_cov="multRepl")

This is the documentation of the R package.

The problem is that it usually takes 1-2 min to run this function in R, but it takes 9-18 min to run it in python using Google Colab. Is there anything I can do (running the code in a python environment) to optimize this long runtime?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source