'How to use CPU only for Embedding?

I need to avoid this error: tensorflow.python error.framework.errors_impl.InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized. It is connected with the acquisition of my 3060 memory, in order to avoid it, I have to do Embedding layer calculations on the CPU, but how? I tried run full model on CPU, and its works fine, but very slow. For example, if I reduce the number of neurons in all layers to 128, then I can use 8000 sentences (data_list[:8000] instead of 6000 for example below) for training, but I have ~ 20000 of them.

My model:

class CPUEmbedding(Embedding):
    @tf_utils.shape_type_conversion
    def build(self, input_shape):
        with ops.device('cpu:0'):
            self.embeddings = self.add_weight(
                shape=(self.input_dim, self.output_dim),
                initializer=self.embeddings_initializer,
                name='embeddings',
                regularizer=self.embeddings_regularizer,
                constraint=self.embeddings_constraint)

        self.built = True

        print('Embedding starts on cpu')

model = Sequential()
model.add(CPUEmbedding(19260, 256, input_length=163))
model.add(LSTM(256, return_sequences=True))  # the output will be a sequence of the same length
model.add(Dropout(0.2))
model.add(LSTM(512))
model.add(Dropout(0.2))
model.add(Dense(self.total_words, activation='softmax'))
adam = Adam(learning_rate=0.001)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['acc'])

Model summary:

Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 cpu_embedding (CPUEmbedding  (None, 163, 256)         4930560   
 )                                                               
                                                                 
 lstm (LSTM)                 (None, 163, 256)          525312    
                                                                 
 dropout (Dropout)           (None, 163, 256)          0         
                                                                 
 lstm_1 (LSTM)               (None, 512)               1574912   
                                                                 
 dropout_1 (Dropout)         (None, 512)               0         
                                                                 
 dense (Dense)               (None, 19260)             9880380   
                                                                 
=================================================================
Total params: 16,911,164
Trainable params: 16,911,164
Non-trainable params: 0

A model that you can run, but first you will need to download some big book

from keras.preprocessing.sequence import pad_sequences
from keras.layers import Embedding, LSTM, Dense, Dropout
from keras.preprocessing.text import Tokenizer
from keras.models import Sequential

from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical

import numpy as np

tokenizer = Tokenizer()

# Book with len > 1 000 000 words
with open('text.txt', encoding='utf-8') as f:
    data = f.read().replace('\ufeff', '')

data_list = data.lower().split("\n")
tokenizer.fit_on_texts(data_list)
total_words = len(tokenizer.word_index) + 1

print('Words number:', total_words)

input_sequences = []

for line in data_list:
    token_list = tokenizer.texts_to_sequences([line])[0]
    for i in range(1, len(token_list)):
        n_gram_sequence = token_list[:i + 1]
        input_sequences.append(n_gram_sequence)

max_sequence_len = max([len(x) for x in input_sequences])

input_sequences = np.array(pad_sequences(input_sequences, maxlen=max_sequence_len, padding='pre'))

X, labels = input_sequences[:, :-1], input_sequences[:, -1]
Y = to_categorical(labels, num_classes=total_words)

model = Sequential()
model.add(Embedding(total_words, 256, input_length=max_sequence_len - 1))
model.add(LSTM(256, return_sequences=True))
model.add(Dropout(0.1))
model.add(LSTM(512))
model.add(Dropout(0.1))
model.add(Dense(total_words, activation='softmax'))
adam = Adam(lr=0.001)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['acc'])

history = model.fit(x=X, y=Y, batch_size=128, epochs=1000)

Versions:

OS: Windows 10
Cuda: 11.6 (latest, from nvidia site)
python: 3.9
tensorflow: 2.8
starts in: cmd
GPU: 3060

OS: Windows 11
Cuda: downloaded by conda
python: 3.8
tensorflow: 2.6
starts in: conda
GPU: 1060 Ti

Solution 1:^[1]

It might be a memory issue. you may not have enough ram to copy embeddings from CPU to GPU. Monitor your RAM and GPU usage. If it takes too much of your memory, instead of storing all 20,000 sentences in a single variable try using a custom data generator where you can generate data as per the need. In that way, you can save a lot of space. So try a custom data generator. Let me know if it works.

Points to consider:

Try changing your hyperparams like reducing the number of neurons. 19260 neurons are just huge. If it's a classification task just use the neurons same as the number of classes. If you have 5 classes use 5 neurons.
Reducing your batch size may also help.
Try to find out which memory gets exhausted while training. If it's the RAM the custom data generator will help but if it's the GPU you have to reduce your params size. I'm guessing for 16,911,164 params you have to have at least 16GB of GPU. So you should consider minimizing this.

Custom data generator example
If RAM is the problem then this might help. Assuming that you have pre_processed the data and saved the data in a text file or in CSV format.

To save data as csv

read csv

this is for custom image data generator but you will get the overall idea

I will add the sample code. This is not a working example. I'm just tring to give you an idea on custom generator

def custom_gen(batch_size,file_path):
    sentences=[]
    labels=[]
    with open(file_path) as file:
        csvreader = csv.reader(file)
        #for the first time it will give header so skip it
        _=next(csvsreader)

        #since you know the len of data 
        for i in range(length of your data):
            #considering you have only two columns [sentences and labels]
            
             data=next(csvreader)#it returns a list with number of columns in your csv.In this case 2 columns
             sentences.append(data[0])
             labels.append(data[1])

             if len(sentences) == batch_size:
                sentences=np.array(sentences)
                labels=np.array(labels)
                final_data=sentences,labels #always be sure if you have the desired shape and datatype

                yeild final_data

                sentences.clear()
                labels.clear()

#finally make the function as generator
dataset=tf.data.Dataset.from_generator(custom_gen,output_signature= 
(tf.TensorSpec(shape=(your sentences array shape),dtype=sentences array dtype),tf.TensorSpec(shape=(labels array shape),dtype=labels array dtype)))

#to generate data for as many times as you want
# prefetch helps you to manage memory. 
dataset=dataset.prefetch(buffer_size=tf.data.AUTOTUNE).repeat(1000)

#you can then fit the model with your custom data generator

model.fit(dataset, epochs=1000) #don't need separate values for x and y

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1

'How to use CPU only for Embedding?

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]