'How to split a dataset (CSV) into training and test data

How to split a dataset (CSV) into training and test data in Python programming language if there are no dependent variables in it?

The project I am currently working on is machine learning based and the dataset does not contain any dependent data. The following code works only if the dataset contains a dependent data-

from sklearn.model_selection import train_test_split
xTrain, xTest, yTrain, yTest = train_test_split(x, y, test_size = 0.2, random_state = 0)

I expect the split to happen without any y variable. Is it possible?



Solution 1:[1]

To split the dataset into train and test sets, we could shuffle the entire dataset first and slice it out based on the required size.

import pandas as pd
shuffle = df.sample(frac=1)

train_size = int(0.8 * len(df))

train = shuffle[:train_size]
test = shuffle[train_size:]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Smaurya