目录链接:吴恩达Deep Learning学习笔记目录
1.Convolutional network
2.Backward propagation of CNN
注:参见
Convolutional Neural Networks: Step by Step
1. Convolutional network
符号约定:
①上标[l]
表示网络第l层
②上标(i)
表示第i个样本
③下标(i)
表示某一层第i个卷积核
④nH、nW和nC表示一层的高、宽和通道数
1.1 packages
import numpy as np
import h5py
import matplotlib.pyplot as plt
%load_ext autoreload
%autoreload 2 #自动重载%aimport排除的模块之外的所有模块。
np.random.seed(1)
1.2 Outline of the Assignment
(1)卷积函数:
Padding
卷积核
向前卷积
向后卷积
(2)池化函数:
向前池化
创建掩码
分配值
1.3 CNN
如果使用编程框架来创建一层卷积层,只需要一行代码来搞定。但在这里我们需要手动来实现卷积层,并理解它是如何运行的:
(1)Padding
np.pad(array, pad_width, mode)
:
参数:
array
:要被填充的数组;
pad_width
:构成如((1,2),(3,4),...)
,第一个元组表示第一维,第二个元组表示第二维,依次类推,(1,2)
表示在第一维最前面填充一行值,最后面填充两行值;
model
:填充方式,如constant
、edge
等,这里选用constant
,缺省值为0,constant_values=(1,3)
是指一个维度中,前面用1
填充,后面用3
填充。
举例:
a = np.array([[1,2],[3,4]])
a_pad = np.pad(a,((1,1),(1,2)),'constant',constant_values = (0,9))
"""
输出:
a: [[1 2]
[3 4]]
a_pad: [[0 0 0 9 9]
[0 1 2 9 9]
[0 3 4 9 9]
[0 9 9 9 9]]
"""
定义padding函数:
def zero_pad(X,pad):
"""
Argument:
X:a numpy array of m samples ,shape=(m,n_H,n_W,n_C),m is num of samples
pad: integer,amount of padding on horizontal and vertical dimensions
return:
X_pad: padded image of shape = (m,n_H+2*pad,n_W+2*pad,n_C)
"""
X_pad = np.pad(X,((0,0),(pad,pad),(pad,pad),(0,0)),"constant")
return X_pad
测试:
np.random.seed(1)
x = np.random.randn(4,3,3,2)
x_pad = zero_pad(x,2)
print("x.shape = ",x.shape)
print("x_pad.shape",x_pad.shape)
fig,ax = plt.subplots(1,2)
ax[0].set_title('x')
ax[0].imshow(x[0,:,:,0])
ax[1].set_title('x_pad')
ax[1].imshow(x_pad[0,:,:,0])
(2)单步卷积
单步卷积实现步骤:
输入数据→在数据每个位置应用卷积核→输出数据(通道数和大小可能会被改变)
def conv_single_step(a_slice_pre,W,b):
"""
Argument:
a_slice_pre: a slice of input,dim=(f,f,n_C_prev)
W: weights of kernel,dim = (f,f,n_C_prev)
b: bias of kernel,dim = (1,1,1)
return:
Z: scalar,the output of convolved a_slice_pre
"""
s = np.multiply(a_slice_pre,W) + b
Z = np.sum(s)
return Z
测试:输入数据切片后大小(单步卷积数据大小)=(4,4,3)
,卷积核大小 =(4,4,3)
,注:每个输入数据通道都对应一个(4,4)的W矩阵
。
np.random.seed(1)
a_slice_pre = np.random.randn(4,4,3)
W = np.random.randn(4,4,3)
b = np.random.randn(1,1,1)
Z = conv_single_step(a_slice_pre,W,b)
print(Z)
"""
输出:-23.16021220252078
"""
(3)单层前向卷积
在卷积网络的前向计算中,将会采用多个卷积核来对输入数据进行卷积,每个卷积核都将输出一个2D的矩阵,然后将每个卷积核的输出堆叠成3D的。
实现单层前向卷积:输入数据为前一层的激活值输出A_pre,各个卷积核的权重由W给出,每个卷积核有一个独立的偏置b(但同一个核内各通道共享),此外还要给出两个超参数padding和stride。
Hit:
①采用切片技术从输入矩阵中获取a_slice_prev
②定义一个a_slice_prev,需要确定的参数有vert_start
, vert_end
, horiz_start
和horiz_end
。如:
def conv_forward(A_pre,W,b,hyperparams):
"""
Argument:
A_pre: the activations of previous layer ,dim = (m,n_H_prev,n_W_prev,n_C_prev)
W:weight matrix,dim = (f,f,n_C_prev,n_C)
b:bias vector,dim = (1,1,1,n_C)
hyperparams: pad and stride
return:
Z:conv output of current layer,dim = (m,n_H,n_W,n_C)
cache: store values for conv_backward()
"""
(m,n_H_prev,n_W_prev,n_C_prev) = A_pre.shape
(f,f,n_C_prev,n_C) = W.shape
stride = hyperparams["stride"]
pad = hyperparams["pad"]
n_H = int((n_H_prev + 2 * pad - f) / stride) + 1
n_W = int((n_W_prev + 2 * pad - f) / stride) + 1
Z = np.zeros((m,n_H,n_W,n_C))
A_pre_pad = zero_pad(A_pre,pad)
for i in range(m):
a_pre_pad = A_pre_pad[i]
for h in range(n_H):
for w in range(n_W):
for c in range(n_C):
vert_start = h * stride # h vert
vert_end = vert_start + f
horiz_start = w * stride
horiz_end = horiz_start + f
a_slice_pre = a_pre_pad[vert_start:vert_end,horiz_start:horiz_end,:]
Z[i,h,w,c] = conv_single_step(a_slice_pre,W[...,c],b[...,c])
assert (Z.shape == (m,n_H,n_W,n_C))
cache = (A_pre,W,b,hyperparams)
return Z,cache
测试:
np.random.seed(1)
A_pre = np.random.randn(10,4,4,3)
W = np.random.randn(2,2,3,8)
b = np.random.randn(1,1,1,8)
hy = {"pad":2,"stride":1}
Z,caches = conv_forward(A_pre,W,b,hy)
print(np.mean(Z))
print(caches[0][1][2][3])#A_pre中第1个样本,第2 n_H,第3 n_W 所有通道数(3)的值
"""
输出:
Z: 0.15585932488906465
cache: [-0.20075807 0.18656139 0.41005165]
"""
注1:
for循环中,①最外层循环为选取样本i
;②次外层循环为用卷积核对该样本纵向扫描(vertical)
,由于输出数据高为n_H
,所以循环n_H
次;③次内层循环为对该样本横向扫描(horizontal) n_W
次;④最内存循环为,选择到卷积核对应区域后,计算各个通道的Z值;
注2:
计算当前层c
通道的Z值时,W[...,c]
是指,对于当前层通道这个维度选取第c
个卷积核,其余维度全选,即选择第c
个卷积核的权值矩阵。
1.4 Pooling layer
池化层能够缩小数据的高宽,以减小计算量,同时,还能够使得特征检测器具有位置变化不敏感特性。一般有两种池化:最大池化和平均池化。 池化层没有用于backward训练的参数,但具有超参数kernel大小和步长stride。
def pool_forward(A_pre,hyperparams,mode = "max"):
"""
Argument:
A_pre: the activations of previous layer ,dim = (m,n_H_prev,n_W_prev,n_C_prev)
hyperparams: pad and stride
return:
A:output of pool layer,dim = (m,n_H,n_W,n_C)
cache:cache: store values for conv_backward()
"""
(m,n_H_prev,n_W_prev,n_C_prev) = A_pre.shape
f = hyperparams["f"]
stride = hyperparams["stride"]
n_H = int((n_H_prev - f) / stride) + 1
n_W = int((n_W_prev - f) / stride) + 1
n_C = n_C_prev
A = np.zeros((m,n_H,n_W,n_C))
for i in range(m):
a_pre = A_pre[i]
for h in range(n_H):
for w in range(n_W):
for c in range(n_C):
vert_start = h * stride
vert_end = vert_start + f
horiz_start = w * stride
horiz_end = horiz_start + f
a_slice_pre = a_pre[vert_start:vert_end,horiz_start:horiz_end,c]
if mode == "max":
A[i,h,w,c] = np.max(a_slice_pre)
elif mode == "average":
A[i,h,w,c] = np.mean(a_slice_pre)
cache = (A_pre,hyperparams)
assert(A.shape == (m,n_H,n_W,n_C))
return A,cache
注:
注意到pooling与convolving中,a_slice_pre
切片时有所区别,在convolving中,最后一维是选取所有值,这是要将该区域所有通道上的数据,经过同一个kernel不同通道的权值卷积后,求和,最后只会输出一个通道;而在pooling中,此前是多少个通道,经过pooling后还是多少个通道,各通道计算值独立。
2. Backward propagation of CNN
CNN由于卷积层(局部连接)和池化层(h、w缩小)的存在,使得后向传播与DNN有所不同。具体推导过程见另一篇学习笔记——卷积神经网络前向传播和BP后向传播计算步骤,请先看这篇笔记,下述讲到的作业中的公式需这篇笔记中的推导来解释
。
2.1 Convolutional layer
(1)计算dA
在函数中采用下述公式来计算dA
:
Wc
表示第c个卷积核,dZ
hw
表示l
层输出Z某一通道第h行、第w列的值的导数(即l-1
层一个局部区域卷积计算输出值的导数)。在这计算公式中,将两个加和符号展开,相当于在对dZ和W进行卷积(命名:卷积式1
):
da_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :] += W[:,:,:,c] * dZ[i, h, w, c]
代码中解释为(假设出入为3通道的3x3数据,卷积核2个(大小为2x2x3),步长1,所以输出为2通道的2x2数据):
①第一个for循环,从所有样本中选取第i个样本,其dZ维度为(n_H, n_W, n_C)=(2,2,2);
第二、三个for循环,从dZ中选取第h行、第w列值,如下图,例如选取1行1列为黑色阴影所有通道,选取1行2列为绿色阴影所有通道
第四个循环选取其中一个kernel(也就是dZ中一个通道),对②中选取的值进行元素级乘法(numpy中的 * 是指对应元素相乘,当一个值乘以一个矩阵时,采用广播机制),计算dA值
+= 符号存在的意义有两个:
i、dZ一个通道的值,经由一个卷积核kernel 1,计算dA时(如③中图),dA中一个通道内数据的叠加,计算出的结果和
卷积式1
计算结果相同:
(2)计算dW
计算dW公式如下:
dW[:,:,:,c] += a_slice * dZ[i, h, w, c]
注:为何在求dA时一个dA通道内的叠加仅是部分格子叠加,而求dW时却是所有对应格子叠加?因为在求dA的表达式中,da_prev_pad[vert_start:vert_end, horiz_start:horiz_end, :]所选取的格子(或者说h、w的坐标只是部分重叠),而求dW的表达式中,dW[:,:,:,c]选取了所有的格子,所以所有对应格子重叠叠加。
(3)计算db
计算db公式如下:
db[:,:,:,c] += dZ[i, h, w, c]
求db时是将一个通道内所有dZ加和作为一个卷积核的b。
卷积层求dA,dW,db代码实现如下:
def conv_backward(dZ,cache):
"""
Argument:
dZ: layer l gradient of the cost about Z
cache:(A_pre, W, b, hyperparams)
return:
dA_pre:(m, n_H_pre, n_W_pre, n_C_pre)
dW:(f, f, n_C_pre, n_C)
db:(1, 1, 1, n_C)
"""
(A_pre, W, b, hyperparams) = cache
(m, n_H_pre, n_W_pre, n_C_pre) = A_pre.shape
(f, f, n_C_pre, n_C) = W.shape
stride = hyperparams["stride"]
pad = hyperparams["pad"]
(m,n_H,n_W,n_C) = dZ.shape
dA_pre = np.zeros((m, n_H_pre, n_W_pre, n_C_pre))
dW = np.zeros((f, f, n_C_pre, n_C))
db = np.zeros((1, 1, 1, n_C))
A_pre_pad = zero_pad(A_pre,pad)
dA_pre_pad = zero_pad(dA_pre,pad)
for i in range(m):
a_pre_pad = A_pre_pad[i]
da_pre_pad = dA_pre_pad[i]
for h in range(n_H):
for w in range(n_W):
for c in range(n_C):
vert_start = h * stride
vert_end = vert_start + f
horiz_start = w * stride
horiz_end = horiz_start + f
a_slice = a_pre_pad[vert_start:vert_end,horiz_start:horiz_end,:]
da_pre_pad[vert_start:vert_end,horiz_start:horiz_end,:] += W[:,:,:,c] * dZ[i,h,w,c]
dW[:,:,:,c] += a_slice * dZ[i,h,w,c]
db[:,:,:,c] += dZ[i,h,w,c]
dA_pre[i,:,:,:] = da_pre_pad[pad:-pad,pad:-pad,:]
assert(dA_pre.shape == (m, n_H_pre, n_W_pre, n_C_pre))
return dA_pre,dW,db
测试
np.random.seed(1)
dA,dW,db = conv_backward(Z,caches)
print(np.mean(dA),np.mean(dW),np.mean(db))
"""
输出:9.608990675868995 10.581741275547566 76.37106919563735
"""
2.2 Pooling layer
(1)最大池化
首先需要构建一个函数来记录最大池化时最大值的位置create_mask_from_window()
:
函数输入为一个(f,f)大小的矩阵,输出为(f,f)的矩阵,矩阵最大值处返回True,其余返回False:
def creat_mask_from_window(x):
"""
Argument:
x: a array,dim = (f,f)
return:
mask:a array ,dim = (f,f), a true at the position of max values of x
"""
mask = x == np.max(x)
return mask
测试
np.random.seed(1)
x = np.random.randn(2,3)
mask = creat_mask_from_window(x)
print(mask)
"""
输出:
[[ True False False]
[False False False]]
"""
(2)平均池化
平均池化,只需要将池化输出层dZ平均分配回对应位置即可:
def distribute_value(dz,shape):
(n_h,n_w) = shape
average = dz / (n_h * n_w)
a = np.ones(shape) * average
return a
测试
a = distribute_value(2,(2,2))
print(a)
"""
输出:
[[0.5 0.5]
[0.5 0.5]]
"""
(3)构建完整池化向后传播函数
def pool_backward(dA,cache,mode = "max"):
"""
return: dA_pre
"""
(A_pre,hyperparams) = cache
stride = hyperparams["stride"]
f = hyperparams["f"]
(m,n_H_pre,n_W_pre,n_C_pre) = A_pre.shape
(m,n_H,n_W,n_C) = dA.shape
dA_pre = np.zeros(A_pre.shape)
for i in range(m):
a_pre = A_pre[i]
for h in range(n_H):
for w in range(n_W):
for c in range(n_C):
vert_start = h * stride
vert_end = vert_start + f
horiz_start = w * stride
horiz_end = horiz_start + f
if mode == "max":
a_pre_slice = a_pre[vert_start:vert_end, horiz_start:horiz_end, c]
mask = creat_mask_from_window(a_pre_slice)
dA_pre[i,vert_start:vert_end, horiz_start:horiz_end, c] += np.multiply(mask, dA[i, h, w, c])
elif mode == "average":
da = dA[i,h,w,c]
shape = (f,f)
dA_pre[i, vert_start:vert_end, horiz_start:horiz_end, c] += distribute_value(da, shape)
assert(dA_pre.shape == A_pre.shape)
return dA_pre
测试
np.random.seed(1)
A_prev = np.random.randn(5, 5, 3, 2)
hparameters = {"stride" : 1, "f": 2}
A, cache = pool_forward(A_prev, hparameters)
dA = np.random.randn(5, 4, 2, 2)
dA_prev = pool_backward(dA, cache, mode = "max")
print("mode = max")
print('mean of dA = ', np.mean(dA))
print('dA_prev[1,:,:,1] = ', dA_prev[1,:,:,0])
print()
dA_prev = pool_backward(dA, cache, mode = "average")
print("mode = average")
print('mean of dA = ', np.mean(dA))
print('dA_prev[1,:,:,1] = ', dA_prev[1,:,:,1])
"""
输出:
mode = max
mean of dA = 0.14571390272918056
dA_prev[1,:,:,1] =
[[ 0. 0. 0. ]
[ 0. 5.05844394 0. ]
[ 0. 0. 0. ]
[ 0. 1.37512611 0. ]
[ 0. -0.59248892 0. ]]
mode = average
mean of dA = 0.14571390272918056
dA_prev[1,:,:,1] =
[[ 0.05338348 -0.42070676 -0.47409023]
[ 0.2787552 -0.25749373 -0.53624893]
[ 0.16879316 0.0348075 -0.13398566]
[-0.13652896 -0.129969 0.00655996]
[-0.0799504 -0.00156347 0.07838693]]
"""
注:池化过程为什么也要用+=符号,不是说各通道间独立么?是要用的,当kernel移动的步长小于kernel大小时,有重叠区域,所以同一个通道上重叠的部分叠加。
总结
本次作业主要用于理解CNN网络卷积层和池化层前向传播和后向传播机制,并手动代码实现,本次课第2次作业采用的是tensorflow1做的,由于装的tensorflow2.0,就不敲了,用2.0几行代码就搞定了,没tensorflow1那么多。tensorflow2.0实现图片分类参见Tensorflow学习笔记(六)——卷积神经网络,tensorflow1 代码见Convolutional Neural Networks: Application。