'how to create a dataset from three files on disk with datasets library in python?

I have three files under the name train.xlsx and validation.xlsx and test.xlsx on disk. And I need to have a dataset with datasets library with these three files. Here is my code:

from google.colab import drive
from datasets import Dataset
import pandas as pd
drive.mount('/content/drive')
train_data = pd.read_excel('/content/drive/My Drive/NLP-Datasets/Question2_Data/train.xlsx')
validation_data = pd.read_excel('/content/drive/My Drive/NLP-Datasets/Question2_Data/valid.xlsx')
test_data = pd.read_excel('/content/drive/My Drive/NLP-Datasets/Question2_Data/test.xlsx')

print(train_data.shape)
print(validation_data.shape)
print(test_data.shape)

Now I need to have a dataset with these keys from corresponding files: dataset['train'] and dataset['validation'] and dataset['test'] Could anyone help me?



Solution 1:[1]

Try this


train_data = train_data.values.tolist()
validation_data = validation_data.values.tolist()
test_data = test_data.values.tolist()
d = {'train_data ' : train_data ,
'validation_data ' : validation_data ,
'test_data ' : test_data 
}
df = pd.DataFrame(data = d)

It is worth noting that .values.tolist() works if these dataframes have one column, if there isn't one, specify it EX.: train_data ['COLUMN'].values.tolist()

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1