'Remove noise from vocals of a song python

I'm trying to separate vocals from a song using a deep learning model. The output is not wrong, but some extra noises cause the signal to sound bad.

The following is 3 seconds of the output file where the noise exists (the areas with a rectangle are the noises):

Link to the audio file

How can I remove these noises from my output file? I can see that these parts have a different amplitude than the other parts of the songs I want. is there a way to filter the signal based on these amplitudes and only allow a specific amplitude range to exist in my signal?

thanks

UPDATE: Please look at the accepted answer and my code for the denoising algorithm that is working as expected!

Solution 1:^[1]

'How can I remove these noises from my output file? You could 'window' it out (multiply those parts of the signal with a step function at e.g. 0.001 for the noise, and at 1 for the signal). This would silence the noisy regions, and keep your regions of interest. It is however not generalisable - and will work only for a pre-specified audio segment, since the window will be fixed.

I can see that these parts have a different amplitude than the other parts of the songs I want. is there a way to filter the signal based on these amplitudes and only allow a specific amplitude range to exist in my signal

Here you could use two approaches 1) running-window to calculate energy (sum of X^{2} over N samples, where X is your audio signal) or 2) generate the Hilbert envelope for your signal, and smooth the envelope with a window of the appropriate length (perhaps 1-100's of milliseconds long). You can set a threshold based on either the energy or Hilbert envelope.

Solution 2:^[2]

I used the accepted answer suggestion and created the following algorithm which uses the Hilbert envelope and denoises parts of the song when there is a noise with no vocals.

def hilbert_metrics(signal):
    '''this calculates the amplitude envelope of the audio and returns it'''
    analytic_signal = sp.signal.hilbert(signal)
    amplitude_envelope = np.abs(analytic_signal)
    instantaneous_phase = np.unwrap(np.angle(analytic_signal))
    instantaneous_frequency = (np.diff(instantaneous_phase) /
                              (2.0*np.pi) * 44100)
    instantaneous_frequency += np.max(instantaneous_frequency)
    return amplitude_envelope, instantaneous_frequency


def denoise(wav_file_handler, hop_length:int=1024, window_length_in_second:float=0.5, threshold_softness:float=4.0, stat_mode="mean", verbose:int=0)->np.array:
  '''This method runs a window on the wav signal.
  it checks the previous segment and the next segment of the current segment and if those segments have a lower than average amplitude / threshold_softness
  then it mens those areas are probably only noise and therefore the middle segment will also become silence
  This method is effective as it looks at the local area and search for the noise
  if the segments have a more than average amplitude /threshold_softness then it probably is actual part of the song
  the lower the threshold_softness, the more extreme the noise detection becomes'''
  stat_mode = str.lower(stat_mode)
  assert stat_mode in ["median", "mean", "mode"], print(f"expected 'mean', 'median' or 'mode' for `stat_mode` but received: '{stat_mode}'")

  def amps_reducer_function(amps):
    if stat_mode == "median":
          return np.median(amps)
    elif stat_mode == "mean":
          return np.mean(amps)
    elif stat_mode == "mode":
          return sp.stats.mode(amps)

  wav = np.copy(wav_file_handler.wav_file)
  amp, freq = hilbert_metrics(wav)
  window_length_frames = int(window_length_in_second*wav_file_handler.sample_rate)
  amp_metric = amps_reducer_function(amp)
  threshold = amp_metric/threshold_softness
  muted_segments_count = 0
  for i in range(window_length_frames, len(wav)-window_length_frames, hop_length):
    segment = amp[i: i+window_length_frames]
    previous_segment_stat = amps_reducer_function(amp[i-window_length_frames: i])
    next_segment_stat = amps_reducer_function(amp[i+window_length_frames: i+window_length_frames*2])
    if previous_segment_stat < threshold and next_segment_stat < threshold:
      if verbose: print(f"previous segment stat: {previous_segment_stat}, threshold: {threshold}, next_segment_stat: {next_segment_stat} ")
      muted_segments_count += 1
      segment *= 0.0
      wav[i: i+window_length_frames] = segment
  if verbose: print(f"Denoising completed! muted {muted_segments_count} segments")
  return wav

This method can definetly improve by using a different threshold or even using low-pass and high-pass filters to remove unwanted frequencies as well.

Here is a example of running the method on a wav signal and you can see the denoising effect:

This is the original signal:

This is the denoised signal using the default parameters:

This is the same signal which is denoised with threshold_softness = 2 instead of 4:

This is the same denoising algorithm as the previous one but instead of np.mean, we are using np.median which makes the method to run much faster and gives a similar result:

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Thejasvi
Solution 2

'Remove noise from vocals of a song python

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]