'constrained clustering of samples of different size
I have n samples of size s1, s2, ..., sn which may or may not follow the same distributions.
I would like to group them in K groups where K will be >= 3
FYC I have found some ideas mostly from there:
https://stats.stackexchange.com/questions/223275/classification-of-samples-into-two-groups?rq=1
I've picked the package conclust which might solve my problem but there's the issue my samples have different sizes. So adapting their code sample (https://rdrr.io/cran/conclust/man/ckmeans.html):
library(plyr)
library(conclust)
sample1 <- c(0, 0, 2)
sample2 <- c(1, 0, 3, 4, 2, 1)
sample3 <- c(1, 1)
sample4 <- c(0, 1, 6)
sample_list <- list(matrix(sample1, nrow = 1), matrix(sample2, nrow = 1), matrix(sample3, nrow = 1), matrix(sample4, nrow = 1))
data <- rbind.fill.matrix(sample_list)
mustLink = matrix(c(1, 2), nrow = 1)
cantLink = matrix(c(1, 4), nrow = 1)
k = 2
pred = ckmeans(data, k, mustLink, cantLink)
pred
Error in if (best == -1 || dd[j] < dd[best]) { :
missing value where TRUE/FALSE needed
I can easily workaround the error by adding data[is.na(data)] <- FALSE but that feels weird, at this point the small samples would have lots of 0 values and would be clustered together even if different, wouldn't they?
Long story short, what would be the way to do constrained clustering on samples of different size in R please?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
