deeplearning.ai——构建循环神经网络

目录

1 - Forward propagation for the basic Recurrent Neural Network

1.1 - RNN cell

1.2 - RNN forward pass

2 - Long Short-Term Memory (LSTM) network

2.1 - LSTM cell

2.2 - Forward pass for LSTM

3 - Backpropagation in recurrent neural networks (OPTIONAL / UNGRADED)

3.1 - Basic RNN backward pass


本文中的符号:

  • 上标[l]表示与第l层相关的对象,例如:a^{[4]}是第4层的激活值,W^{[5]}b^{[5]}是第5层的参数。
  • 上标(i)表示与第i个样本相关的对象,例如:x^{(i)}是第i个训练样本输入。
  • 上标表示在第t个时间步的对象,例如:x^{<t>}是第t个时间步的输入x,x^{(i)<t>}是样本i在第t个时间步的输入。
  • 下标i表示向量的第i个输入,例如:a_{i}^{[l]}表示在l层的激活值的第i个输入。

 

1 - Forward propagation for the basic Recurrent Neural Network

在这个例子中,T_{x}=T_{y}

如何实现一个RNN:

  1. 实现RNN一个时间步所需要的计算过程
  2. T_{x}时间步上实现一个循环使得能够一次处理所有输入

1.1 - RNN cell

下图描述了一个RNN单元单个时间步的操作:

练习:实现上图中的RNN单元

说明:

  1. 使用tanh激活函数计算隐藏状态:a^{<t>}=tanh(W_{aa}a^{<t-1>}+W_{ax}x^{<t>}+b_{a})
  2. 使用新的隐藏状态a^{<t>}计算预测值\hat{y}^{<t>}=softmax(W_{ya}a^{<t>}+b_{y})
  3. (a^{<t>},a^{<t-1>},x^{<t>},parameters)储存在缓存cache中
  4. 返回a^{<t>},y^{<t>}和cache

已在m个样本上向量化,因此x^{<t>}的维度为(n_{x},m)a^{<t>}的维度为(n_{a},m)

# compute next activation state using the formula given above  (n_a, m)
a_next = np.tanh(np.dot(Wax, xt) + np.dot(Waa, a_prev) + ba)
# compute output of the current cell using the formula given above
yt_pred = softmax(np.dot(Wya, a_next) + by)

1.2 - RNN forward pass

练习:实现RNN的前向传播

说明:

  1. 创建一个零向量a,能够存储由RNN计算得到的所有隐藏状态
  2. 初始化“next”隐藏状态为a_{0}
  3. 开始在每一个时间步上循环:更新“next”隐藏状态和缓存值,在a中存储“next”隐藏状态,在y中存储预测值,在cache列表中增加缓存值
  4. 返回a,y和cache
# initialize "a" and "y" with zeros (≈2 lines)
a = np.zeros((n_a, m, T_x))
y_pred = np.zeros((n_y, m, T_x))
    
# Initialize a_next (≈1 line)
a_next = a0
    
# loop over all time-steps
for t in range(T_x):
    # Update next hidden state, compute the prediction, get the cache (≈1 line)
    a_next, yt_pred, cache = rnn_cell_forward(x[:,:,t], a_next, parameters)
    # 这里a_prev不能用a[:,:,t-1],因为t从0开始遍历,第一个a_prev为a[:,:,-1],无效值,只能用a_next自身遍历
    # Save the value of the new "next" hidden state in a (≈1 line)
    a[:,:,t] = a_next
    # Save the value of the prediction in y (≈1 line)
    y_pred[:,:,t] = yt_pred
    # Append "cache" to "caches" (≈1 line)
    caches.append(cache)

 

2 - Long Short-Term Memory (LSTM) network

2.1 - LSTM cell

练习:实现上图中的LSTM单元

说明:

  1. a^{<t-1>}x^{<t>}拼接在单个矩阵中:concat=\begin{bmatrix} a^{<t-1>}\\ x^{<t>} \end{bmatrix}
  2. 计算上述所有的公式
  3. 计算预测值y^{<t>}​​​​​​​
# Concatenate a_prev and xt (≈3 lines)
concat = np.zeros((n_a+n_x, m))
concat[: n_a, :] = a_prev
concat[n_a :, :] = xt

# Compute values for ft, it, cct, c_next, ot, a_next using the formulas given figure (4) (≈6 lines)
ft = sigmoid(np.dot(Wf, concat) + bf)
it = sigmoid(np.dot(Wi, concat) + bi)
cct = np.tanh(np.dot(Wc, concat) + bc)
c_next = ft * c_prev + it * cct
ot = sigmoid(np.dot(Wo, concat) + bo)
a_next = ot * np.tanh(c_next)
    
# Compute prediction of the LSTM cell (≈1 line)
yt_pred = softmax(np.dot(Wy, a_next) + by)

2.2 - Forward pass for LSTM

练习:实现lstm_forward()在T_{x}时间步上运行LSTM

提示:c^{<0>}被初始化为全零

# Retrieve dimensions from shapes of xt and Wy (≈2 lines)
n_x, m, T_x = x.shape
n_y, n_a = parameters['Wy'].shape
    
# initialize "a", "c" and "y" with zeros (≈3 lines)
a = np.zeros((n_a, m, T_x))
c = np.zeros((n_a, m, T_x))
y = np.zeros((n_y, m, T_x))
    
# Initialize a_next and c_next (≈2 lines)
a_next = a0
c_next = np.zeros((n_a, m))
    
# loop over all time-steps
for t in range(T_x):
    # Update next hidden state, next memory state, compute the prediction, get the cache (≈1 line)
    a_next, c_next, yt, cache = lstm_cell_forward(x[:,:,t], a_next, c_next, parameters)
    # Save the value of the new "next" hidden state in a (≈1 line)
    a[:,:,t] = a_next
    # Save the value of the prediction in y (≈1 line)
    y[:,:,t] = yt
    # Save the value of the next cell state (≈1 line)
    c[:,:,t]  = c_next
    # Append the cache into caches (≈1 line)
    caches.append(cache)

 

3 - Backpropagation in recurrent neural networks (OPTIONAL / UNGRADED)

3.1 - Basic RNN backward pass

 

你可能感兴趣的:(机器学习)