'The simple MLP NN for regression in PyTorch - very slow learning - rev2

After some days spent with PyTorch I ended up with the neural network, that despite being quite a good predictor, is extremely slow to learn. It is a MLP with 54 input neurons, 27 hidden neurons with sigmoid activation function, and one linear output neuron. Currently, running the NN for 20 000 epochs lasts around 20 minutes. I had some experience with PyTorch MLP with the same architecture, but 'created from scratch' - without bias, which was worse in terms of predictive capabilities, but the whole training lasted for less than 30s.

The reason I created new NN is that now my model is much more flexible (changing number of neutrons, number of layers or activation functions takes seconds). Also, I tried to use as many built-in tools as possible, so there was no problem with e.g. introducing bias to the neurons.

The code is following (I skipped the imports part):

Hyperparameters:

hyperparam_input_neurons = 54
hyperparam_hidden_neurons_1 = 27
hyperparam_output_neurons = 1

param_learning_rate = 0.01
param_weight_decay = 1e-6
param_momentum = 0.9
param_epochs = 2000
param_test_data_fraction=0.5

loss_function = nn.MSELoss()

training data:

train = pd.read_csv('input.csv')
Xf = torch.tensor(train.values,dtype=torch.float)
res=pd.read_csv('output.csv')
yf=torch.tensor(res.values,dtype=torch.float)

ntrainingelems=int((len(yf)+1)*param_test_data_fraction)

Xt=Xf[:ntrainingelems]
yt=yf[:ntrainingelems]

Xv=Xf[ntrainingelems:]
yv=yf[ntrainingelems:]

traintensor = TensorDataset(Xt, yt)
validtensor = TensorDataset(Xv, yv)
trainloader = DataLoader(traintensor, batch_size=ntrainingelems, shuffle=False) 
validloader = DataLoader(validtensor, batch_size=(len(yf)-ntrainingelems), shuffle=False)

NN definition:

class Model(nn.Module):
    def __init__(self):
        super().__init__()
        self.hidden = nn.Linear(hyperparam_input_neurons, hyperparam_hidden_neurons_1)
        self.output = nn.Linear(hyperparam_hidden_neurons_1, hyperparam_output_neurons)
  
    def forward(self, x):
        x = self.hidden(x)
        x = torch.sigmoid(x)
        x = self.output(x)
        return x

model = Model()

learning:

epoch_number = []
mse_loss_t = []
mse_loss_v = []


optimizer = optim.SGD(model.parameters(), lr=param_learning_rate, weight_decay= param_weight_decay, momentum = param_momentum, nesterov = True)

for epoch in range(1, param_epochs+1):
    train_loss, valid_loss = [], []
    epoch_number.append(int(epoch))
    
    model.train()
    for data, target in trainloader:
        optimizer.zero_grad()
        output = model(data)
        loss = loss_function(output, target)
        loss.backward()
        optimizer.step()
        train_loss.append(loss.item())
    mse_loss_t.append(np.mean(train_loss))

    model.eval()
    for data, target in validloader:
        output = model(data)
        loss = loss_function(output, target)
        valid_loss.append(loss.item())        
    mse_loss_v.append(np.mean(valid_loss))
        
    if epoch==1 or epoch%100==0:
        print ("Epoch:", epoch, "Training Loss: ", np.mean(train_loss), "Valid Loss: ", np.mean(valid_loss))

Do you have any ideas what is wrong here? Or how to make the learning quicker (at least 30x)?

FYI the very quick-to-learn 'scratch-based' NN I mentioned earlier used different definition of the forward method, so I guessed this may be the reason (see code below)... but if I understand the documentation correctly, the NN.linear can also make quick parallel computations.

def forward(self, X):
        self.z = torch.matmul(X, self.W1)
        self.z2 = self.sigmoid(self.z)
        self.z3 = torch.matmul(self.z2, self.W2)

Solution 1:^[1]

I think I finally found the issue - but your comments are welcome. After applying the torch tensor sizing (to avoid (N,N) loss), I removed the DataLoader and used direct "feeding" of the data.

First change - NN object - flattening the output - size (N) instead of (N,1):

    def forward(self, x):
    x = self.hidden(x)
    x = torch.sigmoid(x)
    x = (self.output(x)).flatten()
    return x

Second change - changing the shape of the tensors:

traintensor_X = torch.squeeze(Xt,1)
traintensor_y = torch.squeeze(yt,1)
validtensor_X = torch.squeeze(Xv,1)
validtensor_y = torch.squeeze(yv,1)

Then the last, third change - removing the old DataLoaders and removing the "for data, target in trainloader" subloop. Instead, only one "main" loop is used (one for epochs):

for epoch in range(1, param_epochs+1): ## run the model for x epochs
train_loss, valid_loss = [], []

epoch_number.append(int(epoch))

## training part 
model.train()

optimizer.zero_grad()
## 1. forward propagation

output = model(traintensor_X)

## 2. loss calculation
loss = loss_function(output, traintensor_y)

## 3. backward propagation
loss.backward()

## 4. weight optimization
optimizer.step()

train_loss.append(loss.item())

if epoch==param_epochs:
    print("T train size: ", traintensor_X.size())
    print("T target size: ", traintensor_y.size())
    print("T output size: ", output.size())
    print("T loss size: ", loss.size())


## loss at each epoch (training set)
mse_loss_t.append(np.mean(train_loss))
    
    
## evaluation part 
model.eval()

output = model(validtensor_X)
loss = loss_function(output, validtensor_y)
valid_loss.append(loss.item())

    
## loss at each epoch (validation set)
mse_loss_v.append(np.mean(valid_loss))

if epoch==param_epochs:
    print("V train size: ", validtensor_X.size())
    print("V target size: ", validtensor_y.size())
    print("V output size: ", output.size())
    print("V loss size: ", loss.size())
    
if epoch==1 or epoch%100==0:
    print ("Epoch:", epoch, "Training Loss: ", np.mean(train_loss), "Valid Loss: ", np.mean(valid_loss))

I checked the dimensions of the tensors, they seem to be OK:

T train size:  torch.Size([4289, 54])
T target size:  torch.Size([4289])
T output size:  torch.Size([4289])
V train size:  torch.Size([2209, 54])
V target size:  torch.Size([2209])
V output size:  torch.Size([2209])

If you have any feedback, I'd really appreciate. Maybe I've done some silly mistake? The RMSE values are really similar to the ones from the NN with DataLoaders.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	prz

'The simple MLP NN for regression in PyTorch - very slow learning - rev2

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]