'Huggingface Load_dataset() function throws "ValueError: Couldn't cast"

My goal is to train a classifier able to do sentiment analysis in Slovak language using loaded SlovakBert model and HuggingFace library. Code is executed on Google Colaboratory.

My test dataset is read from this csv file: https://raw.githubusercontent.com/kinit-sk/slovakbert-auxiliary/main/sentiment_reviews/kinit_golden_games.csv

and train dataset: https://raw.githubusercontent.com/kinit-sk/slovakbert-auxiliary/main/sentiment_reviews/kinit_golden_accomodation.csv

Data has two columns: column of Slovak sentences and 2nd column of labels which indicate sentiment of the sentence. Labels have values -1, 0 or 1.

Load_dataset() function throws this error:

ValueError: Couldn't cast Vrtuľník je veľmi zraniteľný pri dobre mierenej streľbe zo zeme. Brániť sa, unikať, alebo vedieť zneškodniť nepriateľa je vecou sekúnd, ak nie stotín, kedy ide život. : string -1: int64 -- schema metadata -- pandas: '{"index_columns": [{"kind": "range", "name": null, "start": 0, "' + 954 to {'Priestorovo a vybavenim OK.': Value(dtype='string', id=None), '1': Value(dtype='int64', id=None)} because column names don't match

Code:

!pip install transformers==4.10.0 -qqq
!pip install datasets -qqq

from re import M
import numpy as np
from datasets import load_metric, load_dataset, Dataset
from transformers import TrainingArguments, Trainer, AutoModelForSequenceClassification, AutoTokenizer, DataCollatorWithPadding
import pandas as pd
from textblob import TextBlob
from textblob.sentiments import NaiveBayesAnalyzer

#links to dataset
test = 'https://raw.githubusercontent.com/kinit-sk/slovakbert-auxiliary/main/sentiment_reviews/kinit_golden_games.csv'
train = 'https://raw.githubusercontent.com/kinit-sk/slovakbert-auxiliary/main/sentiment_reviews/kinit_golden_accomodation.csv'


model_name = 'gerulata/slovakbert'


#Load data
dataset = load_dataset('csv', data_files={'train': train, 'test': test})

What is done wrong while loading the dataset?

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Huggingface Load_dataset() function throws "ValueError: Couldn't cast"

Sources

Related Questions