'Data splitting: ordinal grouped data with custom probability of outcomes

The createDataPartition in caret has a Data Splitting function which can sample data preserving the relative outcome of each rating. I am looking for something similar, but that can preserve groups and handle ordinal data

I am trying to specify the target distribution of my outcomes. I want to preserve groups and see which groups I should conduct a follow-up experiment with if I want to reach a target distribution (rather than simply the current distribution). I have made code that attempts this in a very blunt way:

# Load data
library(rethinking)
data(Trolley)
d <- Trolley

# Inspect current distribution of ratings
d$response <- factor(d$response)
round(summary(d$response)/dim(d)[1],2)

# Find 5 cases that roughly have my target distribution
targetdist <- c(0.3,0.1,0.1,0.1,0.1,0.1,0.1) # Arbitrary goal

# Unique cases
uniqcase <- unique(d$case)

# Poor method
runs <- 100
difmatrix <- matrix(NA,runs,2)
for(i in 1:runs){
  # Take subset
  difmatrix[i,1] <- i
  set.seed(i)
  casetests<- sample(uniqcase,5)
  datasub <- subset(d, case %in%  casetests)

  # Find ratings of subset
  difmatrix[i,2] <- sum(abs(round(summary(datasub$response)/dim(datasub)[1],2)-targetdist))
}
difmatrix[which.min(difmatrix[,2]),]

# Look at best distribution
set.seed(which.min(difmatrix[,2]))
casetests<- sample(uniqcase,5)
datasub <- subset(d, case %in%  casetests)
round(summary(datasub$response)/dim(datasub)[1],2) # Current best distribution

In this toy example, the overall distribution in the data is:

0.13 0.09 0.11 0.23 0.15 0.15 0.15

I aim to get a distribution of

0.3,0.1,0.1,0.1,0.1,0.1,0.1 and get one of:

0.21 0.12 0.12 0.22 0.12 0.11 0.10

I cannot help but think there is a better way to do it. For my actual case, I want to select about 200 from a group of 10,000 so it seems unlikely that I can luck on a good choice.

Thanks for reading. I hope this makes sense at all. I have been working on it for a while, yet still have issues formulating it concisely.

r partitioning caret ordinal

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Data splitting: ordinal grouped data with custom probability of outcomes

Sources

Related Questions