'VAE reconstruction loss (MSE) not decreasing, but KL Divergence is
I've been trying to create an LSTM VAE to reconstruct multivariate time-series data on Tensorflow. To start off I attempted to adapt (changed to Functional API, changed layers) the approach taken here and came up with the following code:
input_shape = 13
latent_dim = 2
prior = tfd.Independent(tfd.Normal(loc=tf.zeros(latent_dim), scale=1), reinterpreted_batch_ndims=1)
input_enc = Input(shape=[512, input_shape])
lstm1 = LSTM(latent_dim * 16, return_sequences=True)(input_enc)
lstm2 = LSTM(latent_dim * 8, return_sequences=True)(lstm1)
lstm3 = LSTM(latent_dim * 4, return_sequences=True)(lstm2)
lstm4 = LSTM(latent_dim * 2, return_sequences=True)(lstm3)
lstm5 = LSTM(latent_dim, return_sequences=True)(lstm4)
lat = Dense(tfpl.MultivariateNormalTriL.params_size(latent_dim))(lstm5)
reg = tfpl.MultivariateNormalTriL(latent_dim, activity_regularizer= tfpl.KLDivergenceRegularizer(prior, weight=1.0))(lat)
lstm6 = LSTM(latent_dim, return_sequences=True)(reg)
lstm7 = LSTM(latent_dim * 2, return_sequences=True)(lstm6)
lstm8 = LSTM(latent_dim * 4, return_sequences=True)(lstm7)
lstm9 = LSTM(latent_dim * 8, return_sequences=True)(lstm8)
lstm10 = LSTM(latent_dim * 16, return_sequences=True)(lstm9)
output_dec = TimeDistributed(Dense(input_shape))(lstm10)
enc = Model(input_enc, reg)
vae = Model(input_enc, output_dec)
vae.compile(optimizer='adam',
loss='mse',
metrics='mse'
)
es = callbacks.EarlyStopping(monitor='val_loss',
mode='min',
verbose=1,
patience=5,
restore_best_weights=True,
)
vae.fit(tf_train,
epochs=1000,
callbacks=[es],
validation_data=tf_val,
shuffle=True
)
By observing the MSE as a metric I've noticed that it does not change during training, only the divergence does down. Then I set the activity_regularizer argument to None and, indeed, the MSE did go down. So it seems that the KL Divergence is preventing the reconstruction error from being optimised for.
Why is that? Am I doing anything obviously wrong? Any help greatly appreciated!
(I'm aware the latent dimension is rather small, I set it to two to easily visualise it, though this behaviour still occurs with larger latent dimensions, hence I don't think the problem lies there.)
Solution 1:[1]
Could it be that you are using an Autoencoder and in the loss there is a KL Divergence term? In a (Beta-) VAE the loss is Loss = MSE + beta * KL
.
Since beta = 1 would be a normal VAE you could try to make beta smaller then one. This should give more wheight to the MSE and less to the KL divergence. This should help the reconstruction but is bad if you would like to have a disentangled latent space.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Error404 |