'Split Train & Test Sets but Indexed Input Differs from Subscript by 1--why?
I've split my data into training and testing sets, but I keep receiving an error that
! Must subset rows with a valid subscript vector. ℹ Logical subscripts must match the size of the indexed input. x Input has size 4067 but subscript
split_data_table == 0has size 4066.
My data is named "JFK_weather_clean2". To execute the split, I did:
set.seed(1234)
split_data_table <- sample(c(rep(0, 0.8 * nrow(JFK_weather_clean2)), rep(1, 0.2 * nrow(JFK_weather_clean2))))
table(split_data_table) results:
| 0 | 1 |
|---|---|
| 3253 | 813 |
From there I tried to create the training set:
training_set <- JFK_weather_clean2[split_data_table == 0, ]
As you have probably noticed, my input data comprises 4,067 rows (which count includes header row), whereas the subscript has size 4,066. I am assuming this issue involves the header row, but I don't know what correction to make in my sample() code. Thanks for any help!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
