'Working with multiple input (image,text) data in ResNet50 or any Deep Learning Models
This is an abstract idea, I dont know the correct pipeline for implementing; I have used a RestNet50 architecture for training a model to classify image into 3 categories; one of the ways i was thinking of exploring was using the textual data of the image;
train_gen = image.ImageDataGenerator().flow_from_directory(dataset_path_train, target_size=input_shape[:2], batch_size=batch_size, class_mode='categorical', shuffle=True, seed=seed)
test_gen = image.ImageDataGenerator().flow_from_directory(dataset_path_valid, target_size=input_shape[:2], batch_size=batch_size, class_mode='categorical', shuffle=True, seed=seed)
Data prep for model; now for each image i also have {text},{label} as key value pair for individual image; if i have to pass 1. WordtoVec 2. TFIDF
I have read about embedding layer in Keras; I am not sure how to embed the text-data along with test_gen and train_gen in the model( in any intermediate layer or after Flatten().
base_model = ResNet50(weights='imagenet', include_top=False,
input_shape=input_shape)
from keras.models import Model, load_model
x = base_model.output
x = Flatten(name='flatten')(x)
predictions = Dense(3, activation='softmax', name='predictions')(x)
model = Model(inputs=base_model.input, outputs=predictions)
for layer in model.layers[0:141]:
layer.trainable = True
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(train_gen,steps_per_epoch=1000 , epochs=2,validation_steps=100, validation_data=test_gen,verbose=1)
Solution 1:[1]
So this is solved after reading through this Multi-input and multi-output models
The text input were TFIDF'ed using sklearn tfidf it resulted in 26038 features for matrix ;
x = base_model.output
x = Flatten(name='flatten')(x)
x2= Input(shape=(26078,), dtype='float32', name='tfidf_input')
combinedInput = concatenate([x2, x])
x = Dense(1024, activation="relu")(combinedInput)
# adding a regularizer with x sch as x=Dense(1024,activation="relu",activity_regularizer=regularizers.l1(0.01))
#x1= Dense(512, activation='softmax', name='predictions')(x)
predictions = Dense(3, activation='softmax', name='predictions')(x)
model = Model(inputs=[base_model.input,x2], outputs=predictions)
for layer in model.layers[0:141]:
layer.trainable = True
I have added an Input layer of dimension (26078,) which would be from the text after transforming it to TFIDF ( now this would add as text feature)
While fitting the data on the model
model.fit([image_data, text_feature], label,epochs=50,validation_split=0.10,steps_per_epoch=100,validation_steps=8, shuffle = True)
It takes 2 inputs one is the ResNet input of image and the other an array of [datasize,26078].
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | IamKarim1992 |
