'Trying to understand the use of ReLu in a LSTM Network

I am currently trying to optimize a simple NN with Optuna. Besides the Learning Rate, Batch Size etc. I want to optimize different network architecture as well. So up until now I optimize the number of LSTM layers, aswell as the number of Dense layers. But now I was thinking about activation functions. Bare in mind I am very new to NN… but I am constently reading about ReLu and Leaky ReLu and I know LSTM uses tanh and sigmoid internally. So first I thought maybe the internal tanh gets switched with a ReLu function but I think I got that wrong right?

What I have seen is that the nn.ReLu() gets applied in between Layers, so I would think it would only make sense to apply it in between my Dense Layers?

Sorry for the Noob Question. I am having trouble understanding these things as they are so basic that they are nowhere discussed.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source