'How to do a Kfold cross validation by taking the samples into account
I have a numeric matrix to classify (45 rows, 102 columns), the first column represents the classes (0 and 1), the second represents the samples, the other columns are the measured values.
Here is a simplified example of the two first columns: the class 0 is obtained from 6 samples and the class 1 from 9 samples.
classes = np.concatenate((np.repeat(0,18), np.repeat(1,27)), axis=None)
samples = np.concatenate((np.repeat(list(range(1, 7)),3), np.repeat(list(range(1, 10)),3)))
np.column_stack((classes,samples))
Class sample
[0, 1]
[0, 1]
[0, 1]
[0, 2]
[0, 2]
[0, 2]
[0, 3]
[0, 3]
[0, 3]
[0, 4]
[0, 4]
[0, 4]
[0, 5]
[0, 5]
[0, 5]
[0, 6]
[0, 6]
[0, 6]
[1, 1]
[1, 1]
[1, 1]
[1, 2]
[1, 2]
[1, 2]
[1, 3]
[1, 3]
[1, 3]
[1, 4]
[1, 4]
[1, 4]
[1, 5]
[1, 5]
[1, 5]
[1, 6]
[1, 6]
[1, 6]
[1, 7]
[1, 7]
[1, 7]
[1, 8]
[1, 8]
[1, 8]
[1, 9]
[1, 9]
[1, 9]
I know the kfold function of sklearn but I want to do a 5fold cross validation by taking into account the sample i.e. all lines obtained from a given sample must be used exclusively in the train, validation or test set.
Is there a python function for this or do I have to do it by hand?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|