'How to use tfa.image.sparse_image_warp?
I have the log mel spectrograms of a few audio clips and I am trying to augment the spectrograms using tfa.image.sparse_image_warp so that time warping can be achieved as done in Google's SpecAugment.
But I am confused on how to do achieve time warping as the documentation does not specify how to initialize arguments to sparse_image_warp.
The method declaration is like this:
tfa.image.sparse_image_warp(
image: tfa.types.TensorLike,
source_control_point_locations: tfa.types.TensorLike,
dest_control_point_locations: tfa.types.TensorLike,
interpolation_order: int = 2,
regularization_weight: tfa.types.FloatTensorLike = 0.0,
num_boundary_points: int = 0,
name: str = 'sparse_image_warp') -> tf.Tensor
Can someone point out how to initialize source_control_point_locations, dest_control_point_locations and num_boundary_points?
Solution 1:[1]
I think I can answer this because we are reading the same paper.
Please notice that though I managed to make the code work, I do not fully understand the theory of warping.
On the surface of my shallow understanding, warping is to transform a pixel to another position.
Therefore,source_control_point_locations specifies the source pixel while dest_control_point_locations corresponding to the target position. They are coordinates, hence the shape [batch_size, num_control_points, 2]
I don't know exactly how num_boundary_points works. What I know is that if you don't fix the boundary points, you might get the error "Input matrix is not invertible. [Op:MatrixSolve]." (It seems that after transformation, some sort of interpolation will be performed, hence the matrix operation.)
import tensorflow_addons as tfa
import tensorflow as tf
import numpy as np
mspec = np.random.randn(128, 501).astype("float32") # spectrogram
src = tf.Variable([[[64, 1]]], dtype = float) # 64 because center freq
dst = tf.Variable([[[64, 50]]], dtype = float) # switch pixel from timestep 1 to 50
warped = tfa.image.sparse_image_warp(mspec, src, dst, num_boundary_points = 2)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
