'Is there a way to get the index of a list in R without match or which
I am trying to detect anomalies in the iris dataset by normalising the data into iris_norm, then splitting that into a training and testing set, then using the knn function to find anomalies. now I can extract those anomalies from the normalised iris_test set but not from the actual iris set, is there a way for me to use the indexes of the values in 'actual' as the indexes in iris? Here is my code
library(gmodels)
library(class)
library(tidyverse)
# STEP 1: Import your dataset, look at a summary
summary(iris)
# STEP 2: Generate a random number to split the dataset.
ran <- sample(1:nrow(iris), 0.9 * nrow(iris))
# The normalization function is created
nor <-function(x) {(x -min(x))/(max(x)-min(x))}
# Run nomalisation on predictor columns
iris_norm <- as.data.frame(lapply(iris[,c(1,2,3,4)], nor))
##extract training set
iris_train <- iris_norm[ran,]
##extract testing set
iris_test <- iris_norm[-ran,]
# Extract 5th column of train dataset because it will be used as
#'cl' argument in knn function.
iris_target_category <- iris[ran,5]
##extract 5th column if test dataset to measure the accuracy
iris_test_category <- iris[-ran,5]
##run knn function
pr <- knn(iris_train,iris_test,cl=iris_target_category,k=15)
##create confusion matrix
tab <- table(pr,iris_test_category)
##this function divides the correct predictions by total number of predictions
#that tell us how accurate teh model is.
accuracy <- function(x){sum(diag(x)/(sum(rowSums(x)))) * 100}
accuracy(tab)
#create a cross table to see where the wrong predictions are
mytab <- CrossTable(iris_test_category, pr, FALSE)
#anomaly indexes
anomalies_index <- which(iris_test_category != pr)
# get the anomaly values
anomaly_value1 <- iris_test[iris_test_category != pr, "Sepal.Length"]
anomaly_value2 <- iris_test[iris_test_category != pr, "Sepal.Width"]
anomaly_value3 <- iris_test[iris_test_category != pr, "Petal.Length"]
anomaly_value4 <- iris_test[iris_test_category != pr, "Petal.Width"]
anomalies <- data.frame(anomaly_value1, anomaly_value2,
anomaly_value3, anomaly_value4)
actual <- iris_test[anomalies_index,]
print(anomalies)
print(actual)
Solution 1:[1]
I found the solution a few minutes later, all I had to do was
actual_index <- as.numeric(rownames(actual))
iris[actual_index,]
and I was able to extract the correct values
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Mennatuallah Ghonem |
