'Create custom convolution layer and compare two keras layers

I am currently creating a network in keras to perform harmonic/percussive source separation on an audio spectrogram using a median filtering technique (http://dafx10.iem.at/papers/DerryFitzGerald_DAFx10_P15.pdf).

Given an input magnitude spectrogram S, and denoting the ith time frame as Si, and the hth frequency slice as Sh, a percussion-enhanced spectrogram frame Pi can be generated by performing median filtering on Si : Pi = M{Si, lperc} where M denotes the median filtering and lperc is the filter length. The individual percussion-enhanced frames Pi are then combined to yield a percussion-enhanced spectrogram P. Similarly, a harmonic-enhanced spectrogram frequency slice Hh can be obtained by median filtering frequency slice Sh : Hi = M{Sh, lharm}.

Once you have P and H, you can see whether each frequency bin Sh,i belongs either to the harmonic or the percussive source : if Hh,i > Ph,i, Sh,i goes to the harmonic spectrogram and takes the value 0 in the percussive spectrogram, and vice versa.

In my network, given the input spectrogram and for a specific time frame Si, I need to compute the medians horizontally for each frequency h. This can be easily done with a lambda layer and tensorflow :

layer_H = Lambda(lambda x:tf.contrib.distributions.percentile(x[0], 50, axis=0))(layer)

Here, the length of the harmonic median filter lharm is the horizontal length of the input spectrogram. The output is a vector whose size is equal to the number of frequencies (in my case, 88).

The next step is where I am stuck right now : I need to compute the medians vertically for the current time frame Si, given the length of the percussive median filter lperc, and knowing that I want the resulting vector to be the same size as the input, so I have to be careful on each end of the the input (the size of the filter will be between lharm and lharm/2 depending on where we're at). This looks like some sort of convolution, for lack of a better word.

Once I have the two resulting vectors Hi and Pi, I want to compare them and assign each value of the original frame Si to either a percussive layer (Lp) or a harmonic layer (Lh). So, I have three different inputs, Hi, Pi and Si, and I want to end up with Lp and Lh by comparing Hi and Pi, and continue building my network from there. If Hi,j > Pi,j, then Lpi,j = 0 and Lhi,j = Si,j.

To sum up, I am stuck on two different problems :

  1. How to compute the horizontal medians ?

  2. How to implement in the network the operations that will allow me to go from Hi, Pi and Si to Lp and Lh ?

Thank you very much in advance !



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source