'automatic differentiation used on 'real dataset' settles in false minima why?

Hi so I've a done a project where we use tensorflow in automatic differentiation. Using a fairly linear dataset generated with numpy like so:

true_w, true_b = 7., 4.

def create_batch(batch_size=64):
    x = np.random.randn(batch_size, 1)
    y = np.random.randn(batch_size, 1) + true_w * x+true_b
    return x, y

when i try and repeat automatic differentiation with any other 'real' dataset from kaggle the weight and bias divert away from the intercept and coefficient from sklearn's or numpy's linear regression functions. even using highly correlated features. the following is using the Whisker-high Whisker-Low dataset from Kaggles World Happiness index 2022. Tried other but these two have a very high correlation, I was assuming this would be the best attempt.

X = np.array(df['Whisker-high']).reshape(-1,1)
y = np.array(df['Whisker-low'])

reg = LinearRegression(fit_intercept=True).fit(X,y)

intercept = np.round(reg.intercept_,4)
coef = np.round(reg.coef_[0],4)

iters = 100
lr = .01

w_history = []
b_history = []

true_w = coef
true_b = intercept

w = tf.Variable( 0.65)
b = tf.Variable(1.5)

for i in range(0, iters):
    inds = np.random.choice(np.arange(0, len(df)), size=100, replace=True)
    X = np.array(df.iloc[list(inds)]['Whisker-high']).reshape(-1,1)
    y = np.array(df.iloc[list(inds)]['Whisker-low'])
    x_batch = tf.convert_to_tensor(X, dtype=tf.float32)
    y_batch = tf.convert_to_tensor(y, dtype=tf.float32)
    with tf.GradientTape(persistent=True) as tape:
        y = b + w *x_batch
        loss = tf.reduce_mean(tf.square( y - y_batch))
    dw = tape.gradient(loss, w)
    db = tape.gradient(loss, b)
    
    del tape
    
    w.assign_sub(lr*dw)
    b.assign_sub(lr*db)
    
    w_history.append(w.numpy())
    b_history.append(b.numpy())
    
    if i %10==0:
        print('iter{}, w={}, b={}'.format(i, w.numpy(), b.numpy()))

plt.plot(range(iters), w_history, label ='learned w')
plt.plot(range(iters), b_history, label ='learned b')
plt.plot(range(iters),[true_w] *iters, label='true w')
plt.plot(range(iters),[true_b] *iters, label='true b')
plt.legend()
plt.show()

although with automatic differentiation the weights and bias do seem to settle in a minima, a simple line plot over the data shows that it would be generous to says it's representative of the dataset.

plt.figure(figsize=(6,6))
plt.scatter(df['speeding'], df['alcohol'])
xseq = np.linspace(0, 9, num=df.shape[0])
plt.plot(xseq, b_history[-1] + w_history[-1]*xseq, color='green')
plt.xlabel('speeding', fontsize=16)
plt.ylabel('alcohol', fontsize=16)
plt.show()


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source