A dense layer, also known as a fully connected layer, connects every neuron from the previous layer to every neuron in the current layer. It performs a linear transformation followed by an activation function.
from tensorflow.keras.layers import Dense
# Example of a dense layer with 64 neurons and ReLU activation
dense_layer = Dense(units=64, activation='relu')
The output of a dense layer is computed as: [ \mathbf{y} = f(\mathbf{W}\mathbf{x} + \mathbf{b}) ] where:
Recurrent networks are designed for sequential data processing, where each output depends on previous computations. Unlike dense layers, RNNs have internal memory, making them suitable for tasks like time-series prediction or natural language processing.
A simple RNN processes sequential data by maintaining a hidden state that captures information from previous time steps.
from tensorflow.keras.layers import SimpleRNN
# Example of a simple RNN layer with 32 units
rnn_layer = SimpleRNN(units=32, return_sequences=True)
The hidden state (\mathbf{h}_t) at time step (t) is computed as: [ \mathbf{h}_t = \tanh(\mathbf{W}h \mathbf{h}{t-1} + \mathbf{W}_x \mathbf{x}_t + \mathbf{b}) ]
LSTMs address the vanishing gradient problem in vanilla RNNs by introducing gating mechanisms.
from tensorflow.keras.layers import LSTM
# Example of an LSTM layer with 64 units
lstm_layer = LSTM(units=64, return_sequences=True)
GRUs are a simpler variant of LSTMs with fewer parameters but similar performance in many cases.
from tensorflow.keras.layers import GRU
# Example of a GRU layer with 32 units
gru_layer = GRU(units=32, return_sequences=True)
A common architecture involves stacking recurrent layers followed by dense layers for final predictions.
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM
model = Sequential([
LSTM(units=64, return_sequences=True, input_shape=(10, 16)),
LSTM(units=32),
Dense(units=10, activation='softmax')
])