'Optimizing objective function in SciPy optimize.minimize

res_M = minimize(L_M, x0=x_M, args=(data, w_vector),
                 method='L-BFGS-B', bounds=[(0.001, 1), (0.001, 1), (0.001, 1)])

def L_M(x, data, w_vector):    
    sum = 0
    for i in range(len(data)):
        sum += w_vector[i]*(data[i][0]*np.log(x[0])+data[i][1]*np.log(x[1])+data[i][2]*np.log(x[2]))
    return -1*sum

As part of an Expectation-Maximization(EM) algorithm I am calling SciPy's optimize.minimize function in the M-step. x_M are three values between 0 and 1, initially all 0.5. The w_vectors are calculated in the E-Step, and consist of a NumPy 1D array of the lengths of the data set with floats in the range 0 and 1. Each line in the data set is three integer feature values between 0 and 3, for example [1 0 2].

The for loop in the objective function is slowing things down. I want to optimize it using vectorized calculations instead. I have tried the following, but it changes the result:

def L_M(x, data, w_vector):
        length = len(data)        
        a_i = data[np.arange(length)][0].sum()
        f_i = data[np.arange(length)][1].sum()
        l_i = data[np.arange(length)][2].sum()
        sum = (w_vector[np.arange(length)].sum())*(a_i*np.log(x[0])+f_i *np.log(x[1])+l_i*np.log(x[2]))
        return -1*sum

The minimize function is getting called many times and I hope to test it on some very large data sets so any ideas on how to rewrite it would be much appreciated.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source