'Keras: Joining two branches in a siamese model with 1D Convolution
I'm trying to implement a Siamese model (i.e. a model with two identical input branches) used in audio inference found in chapter 2.2-2.3 this paper. But I can't figure out how to implement the 1D Convolution that is used to merge the branches.
Drawing from another report (chapter III) I made a version where I subtract the layers instead, just to make it compile. I don’t have audio data prepared for training yet. But the presented code compiles and I get a model summary.
However, I really wan’t to try merging by convolution but I can’t make any sense of the description in the text. Especially this section:
The first step is a location-preserving concatenation of
x1andx2that creates a matrix Z merged of two “channels” and each channel is of size F. The second step is to perform a one-dimensional convolution of Z with a filter, that is subject to training, and projects down the number of channels to one. In addition to this, the one-dimensional filter has length B, essentially taking into account B elements for the comparison. We set B = 129, based on the frequency dimension of the feature extraction model. To avoid changes in the dimensionality of the resulting feature vector after convolu- tion, we apply zero-padding of the order (B−1/2).
(I’ve slightly altered some terms to make it the same as my code. Z and B is not in my code since I don't understand how to implement this part.)
The features are 128 Mel-bands (a bit like FFT:s) so I assume the 129 comes from adding the zero frequency bin like in FFT.
It might very well be other errors in my attempt to implement this model from the text but any advice on where to look further to understand how to join two model branches by 1D convolution is what interest me most here.
import numpy as np
import keras.layers as kl
from keras.models import Sequential, Model
def convSiam (input_shape, n_labels):
def Layer_unit10 (inp):
x = kl.Conv2D (10, (3, 3), strides=(1, 1), padding='same')(inp)
x = kl.BatchNormalization(axis=1)(x) # [1]
x = kl.LeakyReLU(0.01)(x)
x = kl.MaxPooling2D(pool_size=(2,1), padding='same')(x)
return x
def Layer_unit15 (inp):
x = kl.Conv2D (15, (3, 3), strides=(1, 1), padding='same')(inp)
x = kl.BatchNormalization(axis=1)(x) # [1]
x = kl.LeakyReLU(0.01)(x)
x = kl.MaxPooling2D(pool_size=(2,1), padding='same')(x)
return x
def Layer_unit20 (inp):
x = kl.Conv2D (20, (3, 3), strides=(1, 1), padding='same')(inp)
x = kl.BatchNormalization(axis=1)(x) # [1]
x = kl.LeakyReLU(0.01)(x)
x = kl.MaxPooling2D(pool_size=(2,1), padding='same')(x)
return x
inp1 = kl.Input (input_shape)
x1 = Layer_unit10 (inp1)
x1 = Layer_unit15 (x1)
x1 = Layer_unit15 (x1)
x1 = Layer_unit20 (x1)
x1 = Layer_unit20 (x1)
x1 = kl.AveragePooling2D (padding='same')(x1)
inp2 = kl.Input (input_shape)
x2 = Layer_unit10 (inp2)
x2 = Layer_unit15 (x2)
x2 = Layer_unit15 (x2)
x2 = Layer_unit20 (x2)
x2 = Layer_unit20 (x2)
x2 = kl.AveragePooling2D (padding='same')(x2)
# Here concatenation by Convolution in 1D, but how?
merged = kl.Subtract()([x1, x2]) #To make it compile
merged = kl.Dense(2560, activation=None)(merged)
merged = kl.LeakyReLU(0.01)(merged)
merged = kl.Dense (440, activation='relu')(merged)
output = kl.Dense (n_labels, activation='softmax')(merged)
#final reshape to matrix here, not important
model = Model(inputs=[inp1, inp2], outputs=output)
model.summary ()
model.compile (loss='categorical_crossentropy', optimizer='adam', metrics=['acc'])
return model
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
