'Custom training steps with sliding window in tensorflow (from pytorch)

I'm working on a custom transformer model where the training steps method goes like this:

#simplified version of my training method. where model = myTransformerModel()
for windows in data: #step through data
    l1 = model(window)
    loss = torch.mean(l1)
    optimizer.zero_grad()
    loss.backward(retain_graph=True)
    optimizer.step()
scheduler.step()

I'm trying to recreate this in TensorFlow, currently its like this:

for windows in data: #step through data
    with tf.GradientTape() as tape:
        l1 = model.call(window)
        loss = tf.reduce_mean(l1)
    train = optimizer.minimize(loss, var_list=model.trainable_variables,tape=tape)

This functions, but causes the scheduler to step with every window, which throws off the learning rate. I have also tried this in place of the minimize line:

gradients = tape.gradient(loss, model.trainable_variables)
optimizer.apply_gradients(zip(gradients, model.trainable_variables))

Is there a good way to make the TensorFlow model more like the PyTorch one? Is there a better way to implement my steps with the gradienttape?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source