'TensorFlow tfp.math.find_root_chandrupatla() method doesn't work correctly

I'm currently translating a numpy/scipy run code I have to work in a tensorflow setting. The function I am trying to emulate is scipy.optimize.fminbound. This takes in a function, a lower and upper bound, and finds the minimum value for that function within the given bounds.

For me, the important issue is to be able to quickly take care of batched sets of matrices.

I found the TF function tfp.math.find_root_chadrupatla that seems to do the same thing, but really doesn't. It mostly goes in the opposite direction of 0. Here is some example code I made that shows the issue:

(note: the function is very similar to what I actually have to calculate)

In[]:
matrix1 = tf.constant([[1,2], [3,4]], tf.float32)[tf.newaxis, ...]
matrix2 = tf.constant([[[1,2], [3,4]], [[2,4], [6,8]]], tf.float32)

def func(tensor):
    expanded_tensor = tensor[..., tf.newaxis, tf.newaxis]
    offset_matrix1 = expanded_tensor * matrix1

    # now, offset_matrix1.shape == matrix2.shape
    diff_matrix = offset_matrix1 - matrix2

    norm = tf.norm(diff_matrix, axis=[-2, -1])
    return norm

tfp.math.find_root_chandrupatla(func, 
    low=tf.constant([-5, -5], tf.float32),
    high=tf.constant([5, 5], tf.float32))


Out[]:
RootSearchResults(
    estimated_root=<tf.Tensor: shape=(2,), dtype=float32, numpy=array([5., 5.], dtype=float32)>, 
    objective_at_estimated_root=<tf.Tensor: shape=(2,), dtype=float32, numpy=array([21.908901, 16.431677], dtype=float32)>, 
    num_iterations=<tf.Tensor: shape=(2,), dtype=int32, numpy=array([18, 32], dtype=int32)>)

while the true optimal answer would be

estimated_root=[1., 2.]
objective_at_estimated_root=[0., 0.]

If we follow the estimators path for 10 iterations, we can see the tensors it tests:

Out[]:
testing: [-5. -5.] norm: [32.863354 38.34058 ]
testing: [5. 5.] norm: [21.908901 16.431677]
testing: [0. 0.] norm: [ 5.4772253 10.954451 ]
testing: [2.5 2.5] norm: [8.215838  2.7386127]
testing: [3.75 3.75] norm: [15.06237   9.585145]
testing: [3.75 3.75] norm: [15.06237   9.585145]
testing: [4.375 4.375] norm: [18.485636 13.00841 ]
testing: [4.375 4.375] norm: [18.485636 13.00841 ]
testing: [4.6875 4.6875] norm: [20.19727  14.720043]
testing: [4.6875 4.6875] norm: [20.19727  14.720043]

But if by chance the initial conditions are set so that their average is the answer, namely [-5, -5] and [7, 9], we get the right result:

Out[]:
testing: [-5. -5.] output: [32.863354 38.34058 ]
testing: [7. 9.] output: [32.863354 38.34058 ]
testing: [1. 2.] output: [0. 0.]

Is there a better function for the job?

For reference, this is the same function but in numpy/scipy:

In[]:
matrix1 = np.array([[1,2], [3,4]])
matrices2 = np.array([[[1,2], [3,4]], [[2,4], [6,8]]])

for matrix2 in matrices2:
    def func(a):
        offset_matrix1 = a * matrix1
        diff_matrix = offset_matrix1 - matrix2
        norm = np.linalg.norm(diff_matrix)
        return norm

    minimum = scipy.optimize.fminbound(func, -5, 5)
    print("offset:", round(minimum, 2), "result:", round(func(minimum), 2))


Out[]:
offset: 1.0 result: 0.0
offset: 2.0 result: 0.0


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source