'How should I fix "error in knn(): 'train' and 'class' have different lengths"?

I am attempting to use the knn() function in the class package to solve a problem. I have split the iris dataset into 50% training data and 50% test data. I am attempting to predict the variety variable using sepal width and petal width. My knn() call is as follows:

> predictions <- knn(iris.train[, c(1:2)], iris.test[, c(1:2)], iris.train[, 3], k = 10)

In this instance, columns 1 and 2 of iris.train and iris.test are sepal width and petal width. Column 3 of both datasets is the variety variable as a factor. I continuously get the error that 'train' and 'class' have different lengths. When checking dimensions of what I pass into the function, this is what I get:

> dim(iris.train[, c(1:2)])
[1] 75  2

> dim(iris.test[, c(1:2)])
[1] 75  2

> dim(iris.train[, 3])
[1] 75  1

So I would assume that I'm missing something. How can I resolve the issue of 'train' and 'class' being different lengths? Thank you to anyone who can help!



Solution 1:[1]

The cl argument should be a factor/vector of length equal to the number of rows in train. If you check length(iris.train[,3]), you'll see that it is equal to 1 (i.e. it is a one-column frame), which is not the same as the number of rows in train.

Try this:

predictions <- knn(iris.train[, c(1:2)], iris.test[, c(1:2)], iris.train[[3]], k = 10)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 langtang