'constrained clustering of samples of different size

I have n samples of size s1, s2, ..., sn which may or may not follow the same distributions. I would like to group them in K groups where K will be >= 3
FYC I have found some ideas mostly from there:
https://stats.stackexchange.com/questions/223275/classification-of-samples-into-two-groups?rq=1

I've picked the package conclust which might solve my problem but there's the issue my samples have different sizes. So adapting their code sample (https://rdrr.io/cran/conclust/man/ckmeans.html):

library(plyr)
library(conclust)

sample1 <- c(0, 0, 2)
sample2 <- c(1, 0, 3, 4, 2, 1)
sample3 <- c(1, 1)
sample4 <- c(0, 1, 6)

sample_list <- list(matrix(sample1, nrow = 1), matrix(sample2, nrow = 1), matrix(sample3, nrow = 1), matrix(sample4, nrow = 1))
data <- rbind.fill.matrix(sample_list)

mustLink = matrix(c(1, 2), nrow = 1)
cantLink = matrix(c(1, 4), nrow = 1)
k = 2
pred = ckmeans(data, k, mustLink, cantLink)
pred
Error in if (best == -1 || dd[j] < dd[best]) { : 
  missing value where TRUE/FALSE needed

I can easily workaround the error by adding data[is.na(data)] <- FALSE but that feels weird, at this point the small samples would have lots of 0 values and would be clustered together even if different, wouldn't they?

Long story short, what would be the way to do constrained clustering on samples of different size in R please?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source