'search for the best number of k that minimizes the absolute value of the error between the mean of datas and the mean of frequency table
I wrote this function which gives me as output a frequency table divided in K classes, the mean of datas and the the mead of the frequency table.
Data_Frame <- function(x, k) {
if(k<=1) {
print("insert a number greter then 1") }
else {
data_range <- range(x)
interval_width <- (max(data_range)-min(data_range))/k
cutting_values <- seq (from = min(data_range),
to = max(data_range),
by= interval_width,)
lower_bounds <- cutting_values[1:k]
upper_bounds <- cutting_values[2:(k+1)]
counts <- numeric(length = k)
for (k in seq_along(counts)) {
counts[k] <- length(
x[which((x>=cutting_values[k]) & (x<=cutting_values[(k+1)]))])
}
DF <- data.frame(low.bounds = lower_bounds,
up.bounds = upper_bounds,
freq = counts)
Data_m <- mean(x)
DF_m <- sum((DF$low.bounds+DF$up.bounds)/2*DF$freq)/
sum(DF$freq)
result <- list(DF, Data_m = Data_m, DF_m = DF_m)
return(result)
}
}
### Code for using functions
set.seed(4321)
x <- rnorm(1000, 10, 2)
k <- 5L
result_function_1 <- Data_Frame(x, k)
print(result_function_1)
I have to write a second function which must search for the best number of k that minimizes the absolute value of the error between the mean of datas (Data_m) and the mean of frequency table (DF_m). Starting from k = 2 to K = max_K I have to return the optimal k. Could someone help me out?
thanks
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
