'Using a 2d kernel on 1d input in a convolutional network

I am trying to create a TasNet model, which is an audio separation network from the original paper. In section 2.2.1 they discuss how the encoder is going to have a CNN and a Gate CNN. The operation presented is as follows:

wk = ReLU (xk ~ U) σ(xk ~ V)

where wk is a vector of weights for some basis signals, xk is the audio mixture so that xk ∈ ℝ 1 × L, ~ is the symbol for the operation of convolution, and U, V ∈ ℝ N × L.

This is really weird because it implies, as far as I understand, that for the convolution of a 1d signal a 2d kernel is used. tensorflow does not support a kernel with more than one dimensions using Conv1D.

Am I misunderstanding something? If not, how can I convolve an 1d signal using a 2d kernel? Thank you in advance!



Solution 1:[1]

It does not need to be specific CON1D or CON2D, it is DATA input that transforms. I spend some time reading the paper you referenced and found they are talking about similar matching that is good practice for the neurons networks.

Practically you need a label and input for supervising learning but without the labels of the musical input ( I download sample from the Internet ) you can loops it over they can find similarities to themself.

The paper talks about encoder and decoder where that is the primary subjects of the signal, any model can work lstm, dense or CON1D, CON2D or CON3D practical logics.

( 1 ) : Input

from scipy.io.wavfile import read
samplerate, data = read("F:\\temp\\Python\\Speech\\temple_of_love-sisters_of_mercy.wav")

( 2 ) : Windows ( hammings or extracts )

By instants I am doing it now by fixed sizes windows

You can assign any value at the first stikes then updates back values of similarity if you do not provide the music instruments labels.

for i in range(10):
    input_array.append( np.reshape(sample_data[current_window:start_index ], (1, 147, 15, 1 )) )
    current_window = current_window - next_window
    start_index = start_index - next_window

label_array = [ ] 
for i in range(10):
    label_array.append( i )

( 3 ) : Model You can use any mnodel but I also using this for image catagorize cats from people and trucks objects.

"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
Model Initialize
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
model = tf.keras.models.Sequential([
            tf.keras.layers.InputLayer(input_shape=(88, 80, 1)),
            tf.keras.layers.Reshape((88, 80, 1)),
            tf.keras.layers.Conv2D(32, (8, 8), strides=4, padding='same', activation='relu' ),
            tf.keras.layers.Conv2D(64, (4, 4), strides=2, padding='same', activation='relu' ),
            tf.keras.layers.Conv2D(64, (3, 3), activation='relu' ),
            tf.keras.layers.Flatten(), # layer_3
            tf.keras.layers.Dense(256, activation='relu'),
            tf.keras.layers.Dense(64, activation='relu'),
            tf.keras.layers.Dense(256, activation='relu'),
])

model.add(tf.keras.layers.Flatten())
model.add(tf.keras.layers.Dense(target, activation=tf.nn.softmax))
model.summary()

Complies and running ... Don't forget to save weights and multiple rounds of training will find similarities. He is working on it, your quesion it is like the games, no action in the begining until it repeating of action they try the actions (14) Sample

...

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Martijn Pieters