'How to run kmeans and dbscan in r using all processing and memory of linux ubutum 20.04 server (mapply, applay, parApply, etc)?
I have access to a linux server with the following configuration:
I have this one matrix with gamma distirbution range[0:12000] ,Col=30000 and Row=8000. In this matrix 23% of the values are zero.
I need to run a Kmeans and a Dbscan in R.
Procedure: Kmeans and a Dbscan
#step 1
res.mtx<- #create matrix with Col=8000 and Row=30000. Values in Row should be in the range[0:12000], consider a gamma distribution.
#step 2
res.mtx.z<- #Transform 23% of values to zero.
#step 3
minMax <- function(x) {
x<-log1p(x+1)
(x - min(x)) / (max(x) - min(x))
}
#step 4
res.std<-apply(res.mtx.z, 2, minMax) # Time difference of 0.3446261 mins
#step5
res.dst<-dist(res.std)# Time difference > 3 days; only used 1 core during the process
#step6
set.seed(1)
res.K<- kmeans(res.dst,100)$cluster # Time difference > 5 days; only used 1 core during the process
#step7
cluster <- dbscan(res.dst, minPts = 3, eps =20)$cluster # Time difference > 5 days; only used 1 core during the process;
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|

