深度神经网络指的是多隐层的神经网络。
本文采用一个两输入,3个隐藏层,1个输出层来进行说明深度神经网络的前向和后向传播,以及具体的公式推导和代码撰写。
参看笔记13和笔记14
(1) 前向传播:在样本数为1的情况
$$\begin{array}{l}{a^{[0]}} = {\left( {{x_1},{x_2}} \right)^T}\\{z^{[1]}} = {w^{[1]}}{a^{[0]}} + {b^{[1]}}\\{a^{[1]}} = {g^{[1]}}({z^{[1]}})\\{z^{[2]}} = {w^{[2]}}{a^{[1]}} + {b^{[2]}}\\{a^{[2]}} = {g^{[2]}}({z^{[2]}})\\{z^{[3]}} = {w^{[3]}}{a^{[2]}} + {b^{[3]}}\\{a^{[3]}} = {g^{[3]}}({z^{[3]}})\\{z^{[4]}} = {w^{[4]}}{a^{[3]}} + {b^{[4]}}\\{a^{[4]}} = {g^{[4]}}({z^{[4]}}) = \widehat y\end{array}$$
可以得出:
$$\begin{array}{l}{z^{[L]}} = {w^{[L]}}{a^{[L - 1]}} + {b^{[L]}}\\{a^{[L]}} = {g^{[L]}}({z^{[L]}})\end{array}$$
(2) 后向传播:求导数采用链式法则。
$$\begin{array}{l}d{a^{[4]}} = ( - \frac{y}{{{a^{[4]}}}} + \frac{{(1 - y)}}{{(1 - {a^{[4]}})}})\\d{z^{[4]}} = {a^{[4]}} - y\\d{w^{[4]}} = d{z^{[4]}}{a^{[3]}}\\d{b^{[4]}} = d{z^{[4]}}\\d{a^{[3]}} = d{z^{[4]}}{w^{[4]}}\\d{z^{[3]}} = d{a^{[3]}}{a^{[3]}}(1 - {a^{[3]}})\\d{w^{[3]}} = d{z^{[3]}}{a^{[2]}}\\d{b^{[3]}} = d{z^{[3]}}\\....\end{array}$$
隐层的权重和偏置的导数可以表示为:
$$\begin{array}{l}d{a^{[L - 1]}} = {w^{[L]T}}d{z^{[L]}}\\d{z^{[L]}} = d{a^{[L]}}{a^{[L]}}(1 - {a^{[L]}})\\d{w^{[L]}} = d{z^{[L]}}{a^{[L - 1]}}\\d{b^{[L]}} = d{z^{[L]}}\\....\end{array}$$
四、Python 代码实现
输入的样本和输出样本的格式同笔记14一样。来实现一中的深度神经网络。采取梯度下降的方法进行。为了更加清晰的展示深度神经网络中隐层的每个权值和偏置,代码中没有使用数组的形式存储,而是直接采取变量的形式进行展现,也没有使用for循环进行隐层的计算,而是将推导的每一步都编入代码。
# -*- coding: utf-8 -*-
"""
Created on Wed Apr 25 16:06:58 2018
@author: HGaviN
"""
import numpy as np
m = 10
n_0 = 2
n_1 = 4
n_2 = 3
n_3 = 2
n_4 = 1 #these can be save as an array
X = np.array([[1,1],[1.5,1.5],[2,2],[2.5,2.5],[2.75,2.75],[3.15,3.15],[3.5,3.5],[3.75,3.75],[4,4],[4.5,4.5]])#create some examples
X = X.T # transposition
Y = np.array([[0],[0],[0],[0],[0],[1],[1],[1],[1],[1]])
#initialization
alpha = 0.01
# Forward Propagation
W1 = 0.01*np.random.rand(n_0,n_1)
W2 = 0.01*np.random.rand(n_1,n_2)
W3 = 0.01*np.random.rand(n_2,n_3)
W4 = 0.01*np.random.rand(n_3,n_4)
b1 = np.zeros([n_1,1])
b2 = np.zeros([n_2,1])
b3 = np.zeros([n_3,1])
b4 = np.zeros([n_4,1])
# backward propagation
dW1 = np.zeros([n_0,n_1])
dZ1 = np.zeros([m,n_1])
db1 = np.zeros([n_1,1])
dW2 = np.zeros([n_1,n_2])
dZ2 = np.zeros([m,n_2])
db2 = np.zeros([n_2,1])
dW3 = np.zeros([n_2,n_3])
dZ3 = np.zeros([m,n_3])
db3 = np.zeros([n_3,1])
dW4 = np.zeros([n_3,n_4])
dZ4 = np.zeros([m,n_4])
db4 = np.zeros([n_4,1])
j = 0
for iter in range(50):
#Forward Propagation
Z1 = np.dot(W1.T,X)+b1 # n_1 X m
A1 = 1/(1+np.exp(-Z1)) # n_1 X m
Z2 = np.dot(W2.T,A1)+b2 # n_2 X m
A2 = 1/(1+np.exp(-Z2)) # n_2 X m
Z3 = np.dot(W3.T,A2)+b3# n_3 X m
A3 = 1/(1+np.exp(-Z3))# n_3 X m
Z4 = np.dot(W4.T,A3)+b4# n_4 X m
A4 = 1/(1+np.exp(-Z4))# n_4 X m
#backward propagation
dZ4 = A4.T - Y# m X n_4
dW4 = 1/m*np.dot(A3,dZ4)#n_3 X n_4
db4 = 1/m*np.sum(dZ4,axis=0).T # n_4X 1
dA3 = np.dot(dZ4,W4.T) # m X n_3
dZ3 = np.multiply(dA3,np.multiply(A3,1-A3).T) #m X n_3
dW3 = 1/m*np.dot(A2,dZ3)#n_2 X n_3
db3 = 1/m*np.sum(dZ3,axis=0,keepdims=True).T #n_3 X 1
dA2 = np.dot(dZ3,W3.T) # m X n_2
dZ2 = np.multiply(dA2,np.multiply(A2,1-A2).T) #m X n_2
dW2 = 1/m*np.dot(A1,dZ2)#n_1 X n_2
db2 = 1/m*np.sum(dZ2,axis=0,keepdims=True).T #n_2 X 1
dA1 = np.dot(dZ2,W2.T) # m X n_1
dZ1 = np.multiply(dA1,np.multiply(A1,1-A1).T) #m X n_1
dW1 = 1/m*np.dot(X,dZ1)#n_0 X n_1
db1 = 1/m*np.sum(dZ1,axis=0,keepdims=True).T #n_1 X 1
W4 = W4 - alpha*dW4
W3 = W3 - alpha*dW3
W2 = W2 - alpha*dW2
W1 = W1 - alpha*dW1
b4 = b4 - alpha*db4
b3 = b3 - alpha*db3
b2 = b2 - alpha*db2
b1 = b1 - alpha*db1
#np.multiply is like .* in matlab
j = -1/m*np.sum(np.multiply(Y.T,np.log(A2))+np.multiply((1-Y).T,np.log(1-A2)))#
print (j)
print ('\n')
xp = np.array([[4],[3.5]])
z1p = np.dot(W1.T,xp)+b1
a1p = 1/(1+np.exp(-z1p))
z2p = np.dot(W2.T,a1p)+b2
a2p = 1/(1+np.exp(-z2p))
z3p = np.dot(W3.T,a2p)+b3
a3p = 1/(1+np.exp(-z3p))
z4p=np.dot(W4.T,a3p)+b4
a4p = 1/(1+np.exp(-z4p))
print (a4p)
五、参数与超参数
一般而言,神经网络的参数指的是 权重和偏置,而超参数指的是隐层的数量、每个隐层隐元的数量、激活函数类型、学习参数、迭代次数等影响神经网络参数的参数。