'TensorBoard Callback in Keras does not respect initial_epoch of fit?

I'm trying to train multiple models in parallel on a single graphics card. To achieve that I need to resume training of models from saved weights which is not a problem. The model.fit() method has even a parameter initial_epoch that lets me tell the model which epoch the loaded model is on. However when i pass a TensorBoard callback to the fit() method in order to monitor the training of the models, on Tensorboard all data is shown on x=0.

Is there a ways to overcome this and adjust the epoch on tensorboard?

By the way: Im running Keras 2.0.6 and Tensorflow 1.3.0.

self.callbacks = [TensorBoardCallback(log_dir='./../logs/'+self.model_name, histogram_freq=0, write_graph=True, write_images=False, start_epoch=self.step_num)]
self.model.fit(x=self.data['X_train'], y=self.data['y_train'], batch_size=self.input_params[-1]['batch_size'], epochs=1, validation_data=(self.data['X_test'], self.data['y_test']), verbose=verbose, callbacks=self.callbacks, shuffle=self.hyperparameters['shuffle_data'], initial_epoch=self.step_num)
self.model.save_weights('./weights/%s.hdf5'%(self.model_name))
self.model.load_weights('./weights/%s.hdf5'%(self.model_name))
self.model.fit(x=self.data['X_train'], y=self.data['y_train'], batch_size=self.input_params[-1]['batch_size'], epochs=1, validation_data=(self.data['X_test'], self.data['y_test']), verbose=verbose, callbacks=self.callbacks, shuffle=self.hyperparameters['shuffle_data'], initial_epoch=self.step_num)
self.model.save_weights('./weights/%s.hdf5'%(self.model_name))

The resulting graph on Tensorboard looks like this which is not what i was hoping for: enter image description here

Update:

When passing epochs=10 to the first model.fit() the 10 epoch results are displayed in TensorBoard (see picture).

However when reloading the model and running it (with the same callback attached) the on_epoch_end method of the callback gets never called.

enter image description here



Solution 1:[1]

Say we just started fitting our model and our first time epoch count is 30 (please ignore other paramterers just look at epochs and initial_epoch) model.fit(train_dataloader,validation_data = test_dataloader,epochs =30,steps_per_epoch = len(train_dataloader),callbacks = callback_list)

Now say ,after 30 epoch we want to start again from 31st epoch (you can see this in tesnorboard) by changing our Adam optimizer(or nay optimizer) learning rate

so we can do is

model.optimizer.learning_rate = 0.0005

model1.fit(train_dataloader,validation_data = test_dataloader,initial_epoch=30,epochs =55,steps_per_epoch = len(train_dataloader),callbacks = callback_list)

=> So here initial_epoch= where we have left training last time; epochs= initial_epoch+num_epoch we want to run for this second fit

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Dharman