'Why is my loss function tending to inifnity, however it is working appropiately when the x and y co-ordinates are swapped
I have a cookie-cutter Linear Regression PyTorch model. To calculate the expected years of experience, dependent on the individuals' salary. A visualisation of the dataset can be viewed below
where the parameters are as follows.
model = LinearRegressionModel(1, 1) # single dimension
criterion = nn.MSELoss(reduction = "mean") # mean squared error, minimise total loss
learning_rate = 5e-4
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate) # Stochastic Gradient Descent
EPOCHS = 10000
model = model.double()
The model is as follows:
class LinearRegressionModel(nn.Module):
def __init__(self, input_dim, output_dim):
super(LinearRegressionModel, self).__init__()
self.linear = nn.Linear(input_dim, output_dim)
def forward(self, x):
out = self.linear(x)
return out
When I am applying my training function to the dataset I get the following output
epoch 0, loss 8369994489.500052
epoch 5, loss 5.837550943575215e+79
epoch 10, loss 4.071328967185933e+149
epoch 15, loss 2.8394989130314087e+219
epoch 20, loss 1.9803740110638713e+289
epoch 25, loss inf
epoch 30, loss inf
epoch 35, loss inf
epoch 40, loss inf
epoch 45, loss nan
Where my test train split is as follows:
def test_train_split(df):
training_data = df.sample(frac=0.8, random_state=25) #
testing_data = df.drop(training_data.index)
y_train, x_train = (
training_data["YearsExperience"].to_numpy(),
training_data["Salary"].to_numpy(),
)
y_test, x_test = (
testing_data["YearsExperience"].to_numpy(),
testing_data["Salary"].to_numpy(),
)
return x_train, y_train, x_test, y_test
However when I swap my X and Y values thus changing my model to: calculate the salary of an individual depending on their experience my training model gives the following output
epoch 0, loss 9643590644.01929
epoch 5, loss 1910502419.8189254
epoch 10, loss 394543586.1592383
epoch 15, loss 97350361.21930182
epoch 20, loss 39076027.76543479
epoch 25, loss 27637810.070729867
epoch 30, loss 25381050.43396528
epoch 35, loss 24924174.726827644
epoch 40, loss 24820147.1727601
epoch 45, loss 24785300.2845243
epoch 50, loss 24764025.725635834
epoch 55, loss 24745422.391813274
epoch 60, loss 24727353.293723747
The output above is working as intended
Where the test train split is as follows, notice the order of the tuples.
def test_train_split(df):
training_data = df.sample(frac=0.8, random_state=25) #
testing_data = df.drop(training_data.index)
x_train, y_train = (
training_data["YearsExperience"].to_numpy(),
training_data["Salary"].to_numpy(),
)
x_test, y_test = (
testing_data["YearsExperience"].to_numpy(),
testing_data["Salary"].to_numpy(),
)
return x_train, y_train, x_test, y_test
So my question is: Why is this happening, AFAIK the model doesn't care about the data since it's trying to identify heuristics to minimise the loss, So why can I generate a working solution when flipping the axis?
How can I fix my model to work when I have my intended question to be asked which is someone's expected years of experience when you have their salary as an input.
I have tried, to get the model working with my intended question:
Tweaking learning rate
Trying different optimisers
Trying different loss
Changing epoch size
Solution 1:[1]
The issue arose due to the lost function exploding gradient issue since I was using MSEloss and my dataset was using large numbers. the Mean Squared error was initially high, So my iterations on stochastic gradient descent Iterated the loss to a higher value as expected, but these values were too large so when squared they immediately got classified to np.inf, in doing so it was too late to fix the gradient for future iterations.
The solution: Scale the dataset down, whilst drastically decreasing the learning rate, I tried iterations of e^-8.
Or
Use a different loss function. This is the solution I went with, I used the Mean Absolute error. Due to the fact of its properties which I learned from this resource https://neptune.ai/blog/pytorch-loss-functions
When could it be used?
Regression problems, especially when the distribution of the target variable has outliers, such as small or big values that are a great distance from the mean value. It is considered to be more robust to outliers.
The property above fits my use case, thus I implemented it appropriately. Which stopped my exploding gradient issue.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Murat Saglam |

