'How to run kmeans and dbscan in r using all processing and memory of linux ubutum 20.04 server (mapply, applay, parApply, etc)?

I have access to a linux server with the following configuration:

I have this one matrix with gamma distirbution range[0:12000] ,Col=30000 and Row=8000. In this matrix 23% of the values are zero.

I need to run a Kmeans and a Dbscan in R.

Procedure: Kmeans and a Dbscan

#step 1
res.mtx<- #create matrix with Col=8000 and Row=30000. Values in Row should be in the range[0:12000], consider a gamma distribution.

#step 2
res.mtx.z<- #Transform 23% of values to zero. 

#step 3
minMax <- function(x) {
    x<-log1p(x+1)
    (x - min(x)) / (max(x) - min(x))
}

#step 4
res.std<-apply(res.mtx.z, 2, minMax) # Time difference of 0.3446261 mins

#step5
res.dst<-dist(res.std)# Time difference > 3 days; only used 1 core during the process

#step6
set.seed(1)
res.K<- kmeans(res.dst,100)$cluster # Time difference > 5 days; only used 1 core during the process

#step7
cluster <- dbscan(res.dst, minPts = 3, eps =20)$cluster # Time difference > 5 days; only used 1 core during the process;

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'How to run kmeans and dbscan in r using all processing and memory of linux ubutum 20.04 server (mapply, applay, parApply, etc)?

Sources

Related Questions