'How to use CPU only for Embedding?
I need to avoid this error: tensorflow.python error.framework.errors_impl.InternalError: Failed copying input tensor from /job:localhost/replica:0/task:0/device:CPU:0 to /job:localhost/replica:0/task:0/device:GPU:0 in order to run _EagerConst: Dst tensor is not initialized. It is connected with the acquisition of my 3060 memory, in order to avoid it, I have to do Embedding layer calculations on the CPU, but how? I tried run full model on CPU, and its works fine, but very slow. For example, if I reduce the number of neurons in all layers to 128, then I can use 8000 sentences (data_list[:8000] instead of 6000 for example below) for training, but I have ~ 20000 of them.
My model:
class CPUEmbedding(Embedding):
@tf_utils.shape_type_conversion
def build(self, input_shape):
with ops.device('cpu:0'):
self.embeddings = self.add_weight(
shape=(self.input_dim, self.output_dim),
initializer=self.embeddings_initializer,
name='embeddings',
regularizer=self.embeddings_regularizer,
constraint=self.embeddings_constraint)
self.built = True
print('Embedding starts on cpu')
model = Sequential()
model.add(CPUEmbedding(19260, 256, input_length=163))
model.add(LSTM(256, return_sequences=True)) # the output will be a sequence of the same length
model.add(Dropout(0.2))
model.add(LSTM(512))
model.add(Dropout(0.2))
model.add(Dense(self.total_words, activation='softmax'))
adam = Adam(learning_rate=0.001)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['acc'])
Model summary:
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
cpu_embedding (CPUEmbedding (None, 163, 256) 4930560
)
lstm (LSTM) (None, 163, 256) 525312
dropout (Dropout) (None, 163, 256) 0
lstm_1 (LSTM) (None, 512) 1574912
dropout_1 (Dropout) (None, 512) 0
dense (Dense) (None, 19260) 9880380
=================================================================
Total params: 16,911,164
Trainable params: 16,911,164
Non-trainable params: 0
A model that you can run, but first you will need to download some big book
from keras.preprocessing.sequence import pad_sequences
from keras.layers import Embedding, LSTM, Dense, Dropout
from keras.preprocessing.text import Tokenizer
from keras.models import Sequential
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.utils import to_categorical
import numpy as np
tokenizer = Tokenizer()
# Book with len > 1 000 000 words
with open('text.txt', encoding='utf-8') as f:
data = f.read().replace('\ufeff', '')
data_list = data.lower().split("\n")
tokenizer.fit_on_texts(data_list)
total_words = len(tokenizer.word_index) + 1
print('Words number:', total_words)
input_sequences = []
for line in data_list:
token_list = tokenizer.texts_to_sequences([line])[0]
for i in range(1, len(token_list)):
n_gram_sequence = token_list[:i + 1]
input_sequences.append(n_gram_sequence)
max_sequence_len = max([len(x) for x in input_sequences])
input_sequences = np.array(pad_sequences(input_sequences, maxlen=max_sequence_len, padding='pre'))
X, labels = input_sequences[:, :-1], input_sequences[:, -1]
Y = to_categorical(labels, num_classes=total_words)
model = Sequential()
model.add(Embedding(total_words, 256, input_length=max_sequence_len - 1))
model.add(LSTM(256, return_sequences=True))
model.add(Dropout(0.1))
model.add(LSTM(512))
model.add(Dropout(0.1))
model.add(Dense(total_words, activation='softmax'))
adam = Adam(lr=0.001)
model.compile(loss='categorical_crossentropy', optimizer=adam, metrics=['acc'])
history = model.fit(x=X, y=Y, batch_size=128, epochs=1000)
Versions:
1)
- OS: Windows 10
- Cuda: 11.6 (latest, from nvidia site)
- python: 3.9
- tensorflow: 2.8
- starts in: cmd
- GPU: 3060
- OS: Windows 11
- Cuda: downloaded by conda
- python: 3.8
- tensorflow: 2.6
- starts in: conda
- GPU: 1060 Ti
Solution 1:[1]
It might be a memory issue. you may not have enough ram to copy embeddings from CPU to GPU. Monitor your RAM and GPU usage. If it takes too much of your memory, instead of storing all 20,000 sentences in a single variable try using a custom data generator where you can generate data as per the need. In that way, you can save a lot of space. So try a custom data generator. Let me know if it works.
Points to consider:
Try changing your hyperparams like reducing the number of neurons. 19260 neurons are just huge. If it's a classification task just use the neurons same as the number of classes. If you have 5 classes use 5 neurons.
Reducing your batch size may also help.
Try to find out which memory gets exhausted while training. If it's the RAM the custom data generator will help but if it's the GPU you have to reduce your params size. I'm guessing for 16,911,164 params you have to have at least 16GB of GPU. So you should consider minimizing this.
Custom data generator example
If RAM is the problem then this might help. Assuming that you have pre_processed the data and saved the data in a text file or in CSV format.
this is for custom image data generator but you will get the overall idea
I will add the sample code. This is not a working example. I'm just tring to give you an idea on custom generator
def custom_gen(batch_size,file_path):
sentences=[]
labels=[]
with open(file_path) as file:
csvreader = csv.reader(file)
#for the first time it will give header so skip it
_=next(csvsreader)
#since you know the len of data
for i in range(length of your data):
#considering you have only two columns [sentences and labels]
data=next(csvreader)#it returns a list with number of columns in your csv.In this case 2 columns
sentences.append(data[0])
labels.append(data[1])
if len(sentences) == batch_size:
sentences=np.array(sentences)
labels=np.array(labels)
final_data=sentences,labels #always be sure if you have the desired shape and datatype
yeild final_data
sentences.clear()
labels.clear()
#finally make the function as generator
dataset=tf.data.Dataset.from_generator(custom_gen,output_signature=
(tf.TensorSpec(shape=(your sentences array shape),dtype=sentences array dtype),tf.TensorSpec(shape=(labels array shape),dtype=labels array dtype)))
#to generate data for as many times as you want
# prefetch helps you to manage memory.
dataset=dataset.prefetch(buffer_size=tf.data.AUTOTUNE).repeat(1000)
#you can then fit the model with your custom data generator
model.fit(dataset, epochs=1000) #don't need separate values for x and y
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
