'How to train a model in SageMaker Studio with .train and .test extension dataset files?

I'm trying to implement ML models with Amazon SageMaker Studio, the thing is that the model that I want to implement is from hugging face and It uses a Dataset from CONLL Corpora.

Following the instructions from the Hugging Face documentation, I have to read a csv file with this instruction: train = pd.read_csv. But the problem comes with the dataset file extension because it's a .train and .test extension. The error I'm getting is: "ParserError: Error tokenizing data. C error: Expected 1 fields in line 13, saw 3"

Is there a way to convert .test files to csv files? Or how should I read these files extensions?

Links

Dataset: https://www.kaggle.com/nltkdata/conll-corpora

Model: https://huggingface.co/mrm8488/bert-spanish-cased-finetuned-ner



Solution 1:[1]

The dataset in your link seem to be tab separated, not comma separated.

You can read it using the right delimiter, like df = pd.read_csv("<filename>", sep="\t")

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 durga_sury