'Working with multiple input (image,text) data in ResNet50 or any Deep Learning Models

This is an abstract idea, I dont know the correct pipeline for implementing; I have used a RestNet50 architecture for training a model to classify image into 3 categories; one of the ways i was thinking of exploring was using the textual data of the image;

train_gen = image.ImageDataGenerator().flow_from_directory(dataset_path_train, target_size=input_shape[:2], batch_size=batch_size, class_mode='categorical', shuffle=True, seed=seed)
test_gen = image.ImageDataGenerator().flow_from_directory(dataset_path_valid, target_size=input_shape[:2], batch_size=batch_size, class_mode='categorical', shuffle=True, seed=seed)

Data prep for model; now for each image i also have {text},{label} as key value pair for individual image; if i have to pass 1. WordtoVec 2. TFIDF

I have read about embedding layer in Keras; I am not sure how to embed the text-data along with test_gen and train_gen in the model( in any intermediate layer or after Flatten().

base_model = ResNet50(weights='imagenet', include_top=False, 

input_shape=input_shape) 
from keras.models import Model, load_model
x = base_model.output
x = Flatten(name='flatten')(x)
predictions = Dense(3, activation='softmax', name='predictions')(x)
model = Model(inputs=base_model.input, outputs=predictions)

for layer in model.layers[0:141]:
    layer.trainable = True
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit_generator(train_gen,steps_per_epoch=1000 , epochs=2,validation_steps=100, validation_data=test_gen,verbose=1)

Solution 1:^[1]

So this is solved after reading through this Multi-input and multi-output models

The text input were TFIDF'ed using sklearn tfidf it resulted in 26038 features for matrix ;

x = base_model.output
x = Flatten(name='flatten')(x)
x2= Input(shape=(26078,), dtype='float32', name='tfidf_input')
combinedInput = concatenate([x2, x])
x = Dense(1024, activation="relu")(combinedInput)
# adding a regularizer with x  sch as x=Dense(1024,activation="relu",activity_regularizer=regularizers.l1(0.01))
#x1= Dense(512, activation='softmax', name='predictions')(x)

predictions = Dense(3, activation='softmax', name='predictions')(x)
model = Model(inputs=[base_model.input,x2], outputs=predictions)

for layer in model.layers[0:141]:
    layer.trainable = True

I have added an Input layer of dimension (26078,) which would be from the text after transforming it to TFIDF ( now this would add as text feature)

While fitting the data on the model

model.fit([image_data, text_feature], label,epochs=50,validation_split=0.10,steps_per_epoch=100,validation_steps=8, shuffle = True)

It takes 2 inputs one is the ResNet input of image and the other an array of [datasize,26078].

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	IamKarim1992

'Working with multiple input (image,text) data in ResNet50 or any Deep Learning Models

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]