'Huggingface transformers) training loss sometimes decreases really slowly (using Trainer)
I'm fine-tuning sentiment analysis model using news data. As the simplest way is using Huggingface pre-trained model (roberta-base), I followed Huggingface tutorial - https://huggingface.co/blog/sentiment-analysis-python - this one. The custom input data is simple : There're 2 columns named 'text' and 'labels'. The column 'text' is consisted with news sentence and 'label' is consisted with '0' (40%) and '1' (60%). Then it was separated into train, eval, test set.
So this is the problem what I met : 'eval_loss' never changes during training but its accuracy passed 50%. And training loss is decreasing while training. So It seems learned something. Maybe it didn't learn after first epoch or selected best checkpoint automatically - but I'm confusing what is actually happened.
And this is the training code (without labeling code):
from datasets import load_dataset
from transformers import AutoTokenizer
from transformers import DataCollatorWithPadding
from transformers import AutoModelForSequenceClassification
import numpy as np
from datasets import load_metric
from transformers import set_seed
set_seed(42)
dataset = load_dataset('json',data_files={'train':'./data/labeled_news/labeled_news_heads_train.json',
'eval':'./data/labeled_news/labeled_news_heads_eval.json'}, field='data')
tokenizer = AutoTokenizer.from_pretrained("roberta-base")
def tokenize_function(examples):
return tokenizer(examples["text"], padding="max_length", truncation=True)
tokenized_datasets = dataset.map(tokenize_function, batched=True)
train_dataset = tokenized_datasets["train"].shuffle(seed=42)
eval_dataset = tokenized_datasets["eval"].shuffle(seed=42)
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
model = AutoModelForSequenceClassification.from_pretrained("roberta-base", num_labels=2)
def compute_metrics(eval_pred):
load_accuracy = load_metric("accuracy")
load_f1 = load_metric("f1")
logits, labels = eval_pred
predictions = np.argmax(logits, axis=-1)
accuracy = load_accuracy.compute(predictions=predictions, references=labels)["accuracy"]
f1 = load_f1.compute(predictions=predictions, references=labels)["f1"]
return {"accuracy": accuracy, "f1": f1}
from transformers import TrainingArguments, Trainer, EarlyStoppingCallback
repo_name = "Direct_v1"
training_args = TrainingArguments(
output_dir=repo_name,
learning_rate=2e-5,
per_device_train_batch_size=24,
per_device_eval_batch_size=1,
num_train_epochs=5,
weight_decay=0.01,
save_strategy="steps",
evaluation_strategy ='steps',
eval_steps = 250,
save_steps=250,
push_to_hub=False,
save_total_limit = 5,
)
trainer = Trainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
tokenizer=tokenizer,
data_collator=data_collator,
compute_metrics=compute_metrics,
)
trainer.train()
And this is the result printed on console:
Using custom data configuration default-e08b7987c7aa36c3
Reusing dataset json (/home/nvme20142249/.cache/huggingface/datasets/json/default-e08b7987c7aa36c3/0.0.0/ac0ca5f5289a6cf108e706efcf040422dbbfa8e658dee6a819f20d76bb84d26b)
100%|██████████| 2/2 [00:00<00:00, 315.56it/s]
Loading cached processed dataset at /home/nvme20142249/.cache/huggingface/datasets/json/default-e08b7987c7aa36c3/0.0.0/ac0ca5f5289a6cf108e706efcf040422dbbfa8e658dee6a819f20d76bb84d26b/cache-050035fb0e59db40.arrow
Loading cached processed dataset at /home/nvme20142249/.cache/huggingface/datasets/json/default-e08b7987c7aa36c3/0.0.0/ac0ca5f5289a6cf108e706efcf040422dbbfa8e658dee6a819f20d76bb84d26b/cache-2981b391c69b5e0c.arrow
Loading cached shuffled indices for dataset at /home/nvme20142249/.cache/huggingface/datasets/json/default-e08b7987c7aa36c3/0.0.0/ac0ca5f5289a6cf108e706efcf040422dbbfa8e658dee6a819f20d76bb84d26b/cache-26ea42ee0127a8d9.arrow
Loading cached shuffled indices for dataset at /home/nvme20142249/.cache/huggingface/datasets/json/default-e08b7987c7aa36c3/0.0.0/ac0ca5f5289a6cf108e706efcf040422dbbfa8e658dee6a819f20d76bb84d26b/cache-ef064a1251721c99.arrow
Some weights of the model checkpoint at roberta-base were not used when initializing RobertaForSequenceClassification: ['lm_head.layer_norm.weight', 'lm_head.decoder.weight', 'roberta.pooler.dense.bias', 'roberta.pooler.dense.weight', 'lm_head.dense.bias', 'lm_head.dense.weight', 'lm_head.bias', 'lm_head.layer_norm.bias']
- This IS expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing RobertaForSequenceClassification from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of RobertaForSequenceClassification were not initialized from the model checkpoint at roberta-base and are newly initialized: ['classifier.out_proj.weight', 'classifier.dense.weight', 'classifier.dense.bias', 'classifier.out_proj.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
The following columns in the training set don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: text. If text are not expected by `RobertaForSequenceClassification.forward`, you can safely ignore this message.
/home/nvme20142249/PycharmProjects/StockPrediction/venv/lib/python3.8/site-packages/transformers/optimization.py:306: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set `no_deprecation_warning=True` to disable this warning
warnings.warn(
***** Running training *****
Num examples = 10147
Num Epochs = 5
Instantaneous batch size per device = 24
Total train batch size (w. parallel, distributed & accumulation) = 24
Gradient Accumulation steps = 1
Total optimization steps = 2115
12%|█▏ | 250/2115 [02:04<15:33, 2.00it/s]The following columns in the evaluation set don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: text. If text are not expected by `RobertaForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 634
Batch size = 1
100%|██████████| 634/634 [00:14<00:00, 53.32it/s]
Saving model checkpoint to Direct_v1/checkpoint-250
Configuration saved in Direct_v1/checkpoint-250/config.json
{'eval_loss': 0.6686041951179504, 'eval_accuracy': 0.610410094637224, 'eval_f1': 0.7580803134182175, 'eval_runtime': 14.2853, 'eval_samples_per_second': 44.381, 'eval_steps_per_second': 44.381, 'epoch': 0.59}
Model weights saved in Direct_v1/checkpoint-250/pytorch_model.bin
tokenizer config file saved in Direct_v1/checkpoint-250/tokenizer_config.json
Special tokens file saved in Direct_v1/checkpoint-250/special_tokens_map.json
24%|██▎ | 500/2115 [04:28<14:23, 1.87it/s]The following columns in the evaluation set don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: text. If text are not expected by `RobertaForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 634
Batch size = 1
{'loss': 0.6803, 'learning_rate': 1.5271867612293146e-05, 'epoch': 1.18}
24%|██▎ | 500/2115 [04:43<14:23, 1.87it/s]
100%|██████████| 634/634 [00:15<00:00, 49.78it/s]
Saving model checkpoint to Direct_v1/checkpoint-500
Configuration saved in Direct_v1/checkpoint-500/config.json
{'eval_loss': 0.6686403751373291, 'eval_accuracy': 0.610410094637224, 'eval_f1': 0.7580803134182175, 'eval_runtime': 15.0809, 'eval_samples_per_second': 42.04, 'eval_steps_per_second': 42.04, 'epoch': 1.18}
Model weights saved in Direct_v1/checkpoint-500/pytorch_model.bin
tokenizer config file saved in Direct_v1/checkpoint-500/tokenizer_config.json
Special tokens file saved in Direct_v1/checkpoint-500/special_tokens_map.json
35%|███▌ | 750/2115 [06:56<11:30, 1.98it/s]
The following columns in the evaluation set don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: text. If text are not expected by `RobertaForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 634
Batch size = 1
35%|███▌ | 750/2115 [07:10<11:30, 1.98it/s]
100%|██████████| 634/634 [00:14<00:00, 51.95it/s]
Saving model checkpoint to Direct_v1/checkpoint-750
Configuration saved in Direct_v1/checkpoint-750/config.json
{'eval_loss': 0.6685948967933655, 'eval_accuracy': 0.610410094637224, 'eval_f1': 0.7580803134182175, 'eval_runtime': 14.3642, 'eval_samples_per_second': 44.138, 'eval_steps_per_second': 44.138, 'epoch': 1.77}
Model weights saved in Direct_v1/checkpoint-750/pytorch_model.bin
tokenizer config file saved in Direct_v1/checkpoint-750/tokenizer_config.json
Special tokens file saved in Direct_v1/checkpoint-750/special_tokens_map.json
47%|████▋ | 1000/2115 [09:18<09:18, 2.00it/s]
The following columns in the evaluation set don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: text. If text are not expected by `RobertaForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 634
Batch size = 1
{'loss': 0.6786, 'learning_rate': 1.054373522458629e-05, 'epoch': 2.36}
47%|████▋ | 1000/2115 [09:32<09:18, 2.00it/s]
100%|██████████| 634/634 [00:14<00:00, 52.47it/s]
Saving model checkpoint to Direct_v1/checkpoint-1000
Configuration saved in Direct_v1/checkpoint-1000/config.json
{'eval_loss': 0.6686900854110718, 'eval_accuracy': 0.610410094637224, 'eval_f1': 0.7580803134182175, 'eval_runtime': 14.7566, 'eval_samples_per_second': 42.964, 'eval_steps_per_second': 42.964, 'epoch': 2.36}
Model weights saved in Direct_v1/checkpoint-1000/pytorch_model.bin
tokenizer config file saved in Direct_v1/checkpoint-1000/tokenizer_config.json
Special tokens file saved in Direct_v1/checkpoint-1000/special_tokens_map.json
59%|█████▉ | 1250/2115 [11:40<07:14, 1.99it/s]
The following columns in the evaluation set don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: text. If text are not expected by `RobertaForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 634
Batch size = 1
59%|█████▉ | 1250/2115 [11:54<07:14, 1.99it/s]
100%|██████████| 634/634 [00:14<00:00, 52.63it/s]
Saving model checkpoint to Direct_v1/checkpoint-1250
Configuration saved in Direct_v1/checkpoint-1250/config.json
{'eval_loss': 0.6696870923042297, 'eval_accuracy': 0.610410094637224, 'eval_f1': 0.7580803134182175, 'eval_runtime': 14.2725, 'eval_samples_per_second': 44.421, 'eval_steps_per_second': 44.421, 'epoch': 2.96}
Model weights saved in Direct_v1/checkpoint-1250/pytorch_model.bin
tokenizer config file saved in Direct_v1/checkpoint-1250/tokenizer_config.json
Special tokens file saved in Direct_v1/checkpoint-1250/special_tokens_map.json
71%|███████ | 1500/2115 [14:01<05:09, 1.99it/s]
The following columns in the evaluation set don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: text. If text are not expected by `RobertaForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 634
Batch size = 1
{'loss': 0.6798, 'learning_rate': 5.815602836879432e-06, 'epoch': 3.55}
71%|███████ | 1500/2115 [14:16<05:09, 1.99it/s]
100%|██████████| 634/634 [00:14<00:00, 52.17it/s]
Saving model checkpoint to Direct_v1/checkpoint-1500
Configuration saved in Direct_v1/checkpoint-1500/config.json
{'eval_loss': 0.6706184148788452, 'eval_accuracy': 0.610410094637224, 'eval_f1': 0.7580803134182175, 'eval_runtime': 14.5084, 'eval_samples_per_second': 43.699, 'eval_steps_per_second': 43.699, 'epoch': 3.55}
Model weights saved in Direct_v1/checkpoint-1500/pytorch_model.bin
tokenizer config file saved in Direct_v1/checkpoint-1500/tokenizer_config.json
Special tokens file saved in Direct_v1/checkpoint-1500/special_tokens_map.json
Deleting older checkpoint [Direct_v1/checkpoint-250] due to args.save_total_limit
83%|████████▎ | 1750/2115 [16:25<03:03, 1.99it/s]
The following columns in the evaluation set don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: text. If text are not expected by `RobertaForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 634
Batch size = 1
83%|████████▎ | 1750/2115 [16:39<03:03, 1.99it/s]
100%|██████████| 634/634 [00:14<00:00, 50.95it/s]
Saving model checkpoint to Direct_v1/checkpoint-1750
Configuration saved in Direct_v1/checkpoint-1750/config.json
{'eval_loss': 0.6691468954086304, 'eval_accuracy': 0.610410094637224, 'eval_f1': 0.7580803134182175, 'eval_runtime': 14.515, 'eval_samples_per_second': 43.679, 'eval_steps_per_second': 43.679, 'epoch': 4.14}
Model weights saved in Direct_v1/checkpoint-1750/pytorch_model.bin
tokenizer config file saved in Direct_v1/checkpoint-1750/tokenizer_config.json
Special tokens file saved in Direct_v1/checkpoint-1750/special_tokens_map.json
Deleting older checkpoint [Direct_v1/checkpoint-500] due to args.save_total_limit
95%|█████████▍| 2000/2115 [18:48<00:58, 1.95it/s]
The following columns in the evaluation set don't have a corresponding argument in `RobertaForSequenceClassification.forward` and have been ignored: text. If text are not expected by `RobertaForSequenceClassification.forward`, you can safely ignore this message.
***** Running Evaluation *****
Num examples = 634
Batch size = 1
{'loss': 0.6784, 'learning_rate': 1.087470449172577e-06, 'epoch': 4.73}
95%|█████████▍| 2000/2115 [19:04<00:58, 1.95it/s]
100%|██████████| 634/634 [00:15<00:00, 50.16it/s]
Saving model checkpoint to Direct_v1/checkpoint-2000
Configuration saved in Direct_v1/checkpoint-2000/config.json
{'eval_loss': 0.6719586253166199, 'eval_accuracy': 0.610410094637224, 'eval_f1': 0.7580803134182175, 'eval_runtime': 15.2941, 'eval_samples_per_second': 41.454, 'eval_steps_per_second': 41.454, 'epoch': 4.73}
Model weights saved in Direct_v1/checkpoint-2000/pytorch_model.bin
tokenizer config file saved in Direct_v1/checkpoint-2000/tokenizer_config.json
Special tokens file saved in Direct_v1/checkpoint-2000/special_tokens_map.json
Deleting older checkpoint [Direct_v1/checkpoint-750] due to args.save_total_limit
100%|██████████| 2115/2115 [20:05<00:00, 2.05it/s]
Training completed. Do not forget to share your model on huggingface.co/models =)
100%|██████████| 2115/2115 [20:05<00:00, 1.75it/s]
{'train_runtime': 1205.4397, 'train_samples_per_second': 42.088, 'train_steps_per_second': 1.755, 'train_loss': 0.6791386345035922, 'epoch': 5.0}
I think this is quite weird because it seems learned something but eval_loss doesn't change while training. Does 'transformers.Trainer' select best checkpoint automatically? I'm confusing this is an error or not.
** edited on 4/25 : I changed compute_metrics function by
load_accuracy = load_metric("accuracy")
def compute_metrics(eval_pred):
predictions, labels = eval_pred
predictions = np.argmax(predictions, axis=1)
return load_accuracy.compute(predictions=predictions, references=labels)
and training error decreased normally while training. I thought the problem was solved but, sometimes It doesn't. Training error didn't decrease with same datasets. (different checkpoints) Why did this happen?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|