'Performance issues tensorflow- M1 apple silicon chip

I am working with tensorflow in a macbook pro with the M1 chip. After (painfully) I managed to set up the proper invaronment and install the tensorflow mac following this guide, I am now trying to fine tune a BERT model. My environment uses python 3.8 and I use Atom with hydrogen. The dataset I am working with is not public but I share a snippet of my code here:

import tensorflow as tf
from transformers import AutoTokenizer, AutoConfig, DataCollatorWithPadding
from transformers import TFAutoModelForSequenceClassification
from tensorflow.keras.losses import SparseCategoricalCrossentropy

tokenizer = AutoTokenizer.from_pretrained(
    "bert-base-cased", padding='max_length', truncation=True)
def tokenize_function(example):
    return tokenizer(example["tidy_tweet"], truncation=True)

tokenized_datasets = dataset.map(tokenize_function, batched=True)

data_collator = DataCollatorWithPadding(
    tokenizer=tokenizer, return_tensors="tf")


tf_train_dataset = tokenized_datasets["train"].to_tf_dataset(
    columns=["attention_mask", "input_ids", "token_type_ids"],
    label_cols=["classify"],
    shuffle=True,
    collate_fn=data_collator,
    batch_size=8,
)

This last code reports Metal device set to: Apple M1 Pro from which I infer that the metal plug in has detected my chip properly.

Then when trying to fine tune the model:

model = TFAutoModelForSequenceClassification.from_pretrained(
    'bert-base-uncased', num_labels=2)


model.compile(
    optimizer="adam",
    loss=SparseCategoricalCrossentropy(from_logits=True),
    metrics=["accuracy"],
)
model.fit(
    tf_train_dataset,
    validation_data=tf_test_dataset,
)

The code is extremely slow and it takes more than 10 minutes to go over a single epoch. My training sample is about 3000 (worpiece tokenized) tweets. The ouput from the model is:

2022-03-05 16:12:39.271128: W tensorflow/core/platform/profile_utils/cpu_utils.cc:128] Failed to get CPU frequency: 0 Hz
2022-03-05 16:12:45.678284: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
2022-03-05 16:35:33.128057: I tensorflow/core/grappler/optimizers/custom_graph_optimizer_registry.cc:113] Plugin optimizer for device_type GPU is enabled.
503/503 [==============================] - 1489s 3s/step - loss: 0.6477 - accuracy: 0.6802 - val_loss: 0.6198 - val_accuracy: 0.69

<keras.callbacks.History at 0x3910c0e50>

Is this time normal? I was running some trials on bigger datasets in google collab and I was able to run several epochs much faster. Also, every time I've tried to fit the model I receive an internal warning saying that my system has run out of application memory. How is this possible?

Another issue is that all the predictions obtain from the model are exactly the same...

preds = model.predict(tf_test_dataset)["logits"]

outputs:

array([[ 0.40131712, -0.30243358],
   [ 0.40131715, -0.3024336 ],
   [ 0.40131712, -0.30243355],
   ...,
   [ 0.40131715, -0.30243358],
   [ 0.40131712, -0.30243355],
   [ 0.40131715, -0.30243355]], dtype=float32)


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source