'How to split a dataset (CSV) into training and test data
How to split a dataset (CSV) into training and test data in Python programming language if there are no dependent variables in it?
The project I am currently working on is machine learning based and the dataset does not contain any dependent data. The following code works only if the dataset contains a dependent data-
from sklearn.model_selection import train_test_split
xTrain, xTest, yTrain, yTest = train_test_split(x, y, test_size = 0.2, random_state = 0)
I expect the split to happen without any y variable.
Is it possible?
Solution 1:[1]
To split the dataset into train and test sets, we could shuffle the entire dataset first and slice it out based on the required size.
import pandas as pd
shuffle = df.sample(frac=1)
train_size = int(0.8 * len(df))
train = shuffle[:train_size]
test = shuffle[train_size:]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Smaurya |
