第三周作业,对于作业环境安装不知道的可以看一下上一篇文章:
http://blog.csdn.net/liuzhongkai123/article/details/78766351
这一周把文档也考过来了。
Welcome to your week 3 programming assignment. It’s time to build your first neural network, which will have a hidden layer. You will see a big difference between this model and the one you implemented using logistic regression.
You will learn how to:
- Implement a 2-class classification neural network with a single hidden layer
- Use units with a non-linear activation function, such as tanh
- Compute the cross entropy loss
- Implement forward and backward propagation
Let’s first import all the packages that you will need during this assignment.
- numpy is the fundamental package for scientific computing with Python.
- sklearn provides simple and efficient tools for data mining and data analysis. #数据挖掘和数据分析
- matplotlib is a library for plotting graphs in Python. #绘图
- testCases provides some test examples to assess the correctness of your functions
- planar_utils provide various useful functions used in this assignment
导入本周课程作业用到的包和模块
import numpy as np
import matplotlib.pyplot as plt
from testCases import *
import sklearn
import sklearn.datasets
import sklearn.linear_model
from planar_utils import plot_decision_boundary, sigmoid, load_planar_dataset, load_extra_datasets
First, let’s get the dataset you will work on. The following code will load a “flower” 2-class dataset into variables X and Y.
使用load_planar_dataset()获得用的数据集,返回X表示坐标点半径和角度,Y表示颜色red or blue
def load_planar_dataset():
np.random.seed(1)
m = 400 # number of examples
N = int(m/2) # number of points per class,分为两类,每类是红或蓝
D = 2 # dimensionality二维
X = np.zeros((m,D)) # data matrix where each row is a single example
Y = np.zeros((m,1), dtype='uint8') # labels vector (0 for red, 1 for blue)
a = 4 # maximum ray of the flower 花的最大值
#linspace以指定的时间间隔返回均匀间隔的数字。
for j in range(2):
ix = range(N*j,N*(j+1))#ix=(0,199)(200,399)
t = np.linspace(j*3.12,(j+1)*3.12,N) + np.random.randn(N)*0.2 # theta角度,产生200个角度并加入随机数,保证角度随机分开,图像开起来稀疏程度不一
r = a*np.sin(4*t) + np.random.randn(N)*0.2 # radius半径,4sin(4*t),并加入一定的随机,图像轨道不平滑
X[ix] = np.c_[r*np.sin(t), r*np.cos(t)] #生成坐标点
Y[ix] = j #red or blue
X = X.T
Y = Y.T
return X, Y
数据集生成点云图:
plt.scatter(X[0, :], X[1, :], c=np.squeeze(Y), s=40, cmap=plt.cm.Spectral)#此处要squeeze一下,否则可能报错
结果:
You have:
- a numpy-array (matrix) X that contains your features (x1, x2)
- a numpy-array (vector) Y that contains your labels (red:0, blue:1).
Lets first get a better sense of what our data is like.
Exercise: How many training examples do you have? In addition, what is the shape of the variables X and Y?
Hint: How do you get the shape of a numpy array? (help)
### START CODE HERE ### (≈ 3 lines of code)
shape_X=np.shape(X)
shape_Y=np.shape(Y)
m=np.shape(X[0,:])
### END CODE HERE ###
print ('The shape of X is: ' + str(shape_X))
print ('The shape of Y is: ' + str(shape_Y))
print ('I have m = %d training examples!' % (m))
结果:
The shape of X is: (2, 400)
The shape of Y is: (1, 400)
I have m = 400 training examples!
Before building a full neural network, lets first see how logistic regression performs on this problem. You can use sklearn’s built-in functions to do that. Run the code below to train a logistic regression classifier on the dataset.
# Train the logistic regression classifier
clf = sklearn.linear_model.LogisticRegressionCV()
# clf.fit(X.T, Y.T)
clf.fit(X.T, Y.T.ravel())**#将多维数组降位一维**
You can now plot the decision boundary of these models. Run the code below.
# Plot the decision boundary for logistic regression
**#使用模块函数把分类器画出来,一条直线分为的两个部分。**
plot_decision_boundary(lambda x: clf.predict(x), X, Y)
plt.title("Logistic Regression")
# Print accuracy
LR_predictions = clf.predict(X.T)**#得到预测值Y_hat,标签**
print ('Accuracy of logistic regression: %d ' % float((np.dot(Y,LR_predictions) + np.dot(1-Y,1-LR_predictions))/float(Y.size)*100) +
'% ' + "(percentage of correctly labelled datapoints)")#Y*Y_hat+(1-Y)*(1-Y_hat)#**看预测和真实匹配程度**
结果:
Accuracy of logistic regression: 47 % (percentage of correctly labelled datapoints)
Logistic regression did not work well on the “flower dataset”. You are going to train a Neural Network with a single hidden layer.
Here is our model:
Mathematically:
Given the predictions on all the examples, you can also compute the cost J as follows:
Reminder: The general methodology to build a Neural Network is to:
1. Define the neural network structure ( # of input units, # of hidden units, etc).
2. Initialize the model’s parameters
3. Loop:
- Implement forward propagation
- Compute loss
- Implement backward propagation to get the gradients
- Update parameters (gradient descent)
You often build helper functions to compute steps 1-3 and then merge them into one function we call nn_model(). Once you’ve built nn_model() and learnt the right parameters, you can make predictions on new data.
强调了模型函数的重要性,当你完成了结构创建、参数初始化、计算了损失函数和梯度等数值要把他们集合到一个函数中,便于后续新的数据使用。
Exercise: Define three variables:
- n_x: the size of the input layer
- n_h: the size of the hidden layer (set this to 4)
- n_y: the size of the output layer
Hint: Use shapes of X and Y to find n_x and n_y. Also, hard code the hidden layer size to be 4.
def layer_sizes(X,Y):
n_x=X.shape[0]
n_y=Y.shape[0]
n_h=4
return (n_x,n_h,n_y)
Exercise: Implement the function initialize_parameters().
Instructions:
- Make sure your parameters’ sizes are right. Refer to the neural network figure above if needed.
- You will initialize the weights matrices with random values.
- Use: np.random.randn(a,b) * 0.01 to randomly initialize a matrix of shape (a,b).
- You will initialize the bias vectors as zeros.
- Use: np.zeros((a,b)) to initialize a matrix of shape (a,b) with zeros.
def initialize_parameters(n_x,n_h,n_y):
np.random.seed(2)
W1=np.random.randn(n_h,n_x)*0.01
b1=np.zeros((n_h,1))
W2=np.random.randn(n_y,n_h)*0.01
b2=np.zeros((n_y,1))
assert(W1.shape==(n_h,n_x))
assert (b1.shape == (n_h, 1))
assert (W2.shape == (n_y, n_h))
assert (b2.shape == (n_y, 1))
parameters={'W1':W1,
'b1':b1,
'W2':W2,
'b2':b2}
return parameters
字典输出参数
Question: Implement forward_propagation().
Instructions:
- Look above at the mathematical representation of your classifier.
- You can use the function sigmoid(). It is built-in (imported) in the notebook.
- You can use the function np.tanh(). It is part of the numpy library.
- The steps you have to implement are:
1. Retrieve each parameter from the dictionary “parameters” (which is the output of initialize_parameters()) by using parameters[“..”].
2. Implement Forward Propagation. Compute Z[1],A[1],Z[2] and A[2] (the vector of all your predictions on all the examples in the training set).
- Values needed in the backpropagation are stored in “cache“. The cache will be given as an input to the backpropagation function.
定义前向传播函数:
def forward_propagation(X,parameters):
W1 = parameters["W1"]
b1 = parameters["b1"]
W2 = parameters["W2"]
b2 = parameters["b2"]
Z1=np.dot(W1,X)+b1
A1=np.tanh(Z1)
Z2=np.dot(W2,A1)+b2
A2=sigmoid(Z2)
assert(A2.shape==(1,X.shape[1]))
cache = {"Z1": Z1,
"A1": A1,
"Z2": Z2,
"A2": A2}
return A2, cache
Now that you have computed A[2] (in the Python variable “A2“), which contains a[2](i) for every example, you can compute the cost function as follows:
计算损失函数:
Exercise: Implement compute_cost() to compute the value of the cost J.
def compute_cost(A2,Y,parameters):
m=Y.shape[1]
cost=-1/m*np.sum(np.multiply(Y,np.log(A2))+np.multiply((1-Y),np.log(1-A2)))
cost=np.squeeze(cost)# makes sure cost is the dimension we expect.
# E.g., turns [[17]] into 17
assert(isinstance(cost,float))
return cost
Question: Implement the function backward_propagation().
**神经网络计算的核心代码,主要是根据链式求导法则得到dW,db的表达式(不要忘了除m)得到梯度。
这里用的激励函数是tanh,如果使用sigmoid得到的w会有不同,课程推荐使用tanh。**
def backward_propagation(parameters,cache,X,Y):
m = X.shape[1]
W1=parameters['W1']#n_h,n_x
W2=parameters['W2']#n_y,n_h
b1=parameters['b1']#n_h,1
b2=parameters['b2']#n_y,1
A1=cache['A1']#n_h,m
A2=cache['A2']#n_y,m
Z1=cache['Z1']#n_h,m
Z2=cache['Z2']#n_y,m
dZ2=A2-Y#n_y,m
dW2=np.dot(dZ2,A1.T)/m#n_y,n_h
db2=np.sum(dZ2,axis=1,keepdims=True)/m#n_y,1
dZ1=np.multiply(np.dot(W2.T,dZ2),(1-A1**2))#n_h,m
dW1=np.dot(dZ1,X.T)/m#n_h,n_x
db1=np.sum(dZ1,axis=1,keepdims=True)/m#n_h,1
grads = {"dW1": dW1,
"db1": db1,
"dW2": dW2,
"db2": db2}
return grads
测试:
parameters, cache, X_assess, Y_assess = backward_propagation_test_case()
grads = backward_propagation(parameters, cache, X_assess, Y_assess)
print ("dW1 = "+ str(grads["dW1"]))
print ("db1 = "+ str(grads["db1"]))
print ("dW2 = "+ str(grads["dW2"]))
print ("db2 = "+ str(grads["db2"]))
测试结果(tanh):
dW1 = [[ 0.01018708 -0.00708701]
[ 0.00873447 -0.0060768 ]
[-0.00530847 0.00369379]
[-0.02206365 0.01535126]]
db1 = [[-0.00069728]
[-0.00060606]
[ 0.000364 ]
[ 0.00151207]]
dW2 = [[ 0.00363613 0.03153604 0.01162914 -0.01318316]]
db2 = [[ 0.06589489]]
Question: Implement the update rule. Use gradient descent. You have to use (dW1, db1, dW2, db2) in order to update (W1, b1, W2, b2).
General gradient descent rule: θ=θ−α∂J∂θ
更新权重参数
def update_parameters(parameters,grads,learning_rate=1.2):
W1 = parameters["W1"]
b1 = parameters["b1"]
W2 = parameters["W2"]
b2 = parameters["b2"]
dW1 = grads["dW1"]
db1 = grads["db1"]
dW2 = grads["dW2"]
db2 = grads["db2"]
###梯度更新,每迭代一次更新一次###
W1 -= learning_rate * dW1
b1 -= learning_rate * db1
W2 -= learning_rate * dW2
b2 -= learning_rate * db2
parameters = {"W1": W1,
"b1": b1,
"W2": W2,
"b2": b2}
return parameters
构建模型,把前面的综合到一个模型里面
Question: Build your neural network model in nn_model().
Instructions: The neural network model has to use the previous functions in the right order.
def nn_model(X,Y,n_h,num_iterations=10000,print_cost=False):
np.random.seed(3)
n_x = layer_sizes(X, Y)[0]
n_y = layer_sizes(X, Y)[2]
parameters=initialize_parameters(n_x,n_h,n_y)
W1 = parameters["W1"]
b1 = parameters["b1"]
W2 = parameters["W2"]
b2 = parameters["b2"]
for i in range(0,num_iterations):
A2,cache=forward_propagation(X,parameters)#前向传播节点
cost = compute_cost(A2, Y, parameters)#计算损失函数
grads=backward_propagation(parameters,cache,X,Y)#计算后向传播梯度
parameters=update_parameters(parameters,grads,learning_rate=1.2)#使用梯度更新W,b一次
if print_cost and i % 1000 == 0:
print ("Cost after iteration %i: %f" %(i, cost))
return parameters
测试:
X_assess, Y_assess = nn_model_test_case()
parameters = nn_model(X_assess, Y_assess, 4, num_iterations=10000, print_cost=False)
print("W1 = " + str(parameters["W1"]))
print("b1 = " + str(parameters["b1"]))
print("W2 = " + str(parameters["W2"]))
print("b2 = " + str(parameters["b2"]))
结果:
W1 = [[ -7.13991779 9.27778317]
[-12.59245311 2.48423279]
[ -7.13853315 9.27827144]
[ 7.12809846 -9.27653998]]
b1 = [[ 3.84586106]
[ 6.33161315]
[ 3.84613897]
[-3.84590913]]
W2 = [[-3040.94994726 -2997.63395067 -3040.57705459 3016.53185899]]
b2 = [[-21.48466572]]
Question: Use your model to predict by building predict().
Use forward propagation to predict results.
预测函数:
def predict(parameters,X):
A2, cache = forward_propagation(X, parameters)
predictions = (A2 > 0.5)
return predictions
测试:
parameters, X_assess = predict_test_case()
predictions = predict(parameters, X_assess)
print("predictions mean = " + str(np.mean(predictions)))
结果:
predictions mean = 0.666666666667
实际点云集测试:
parameters = nn_model(X, Y, n_h = 4, num_iterations = 10000, print_cost=True)
plot_decision_boundary(lambda x: predict(parameters, x.T), X, Y)
plt.title("Decision Boundary for hidden layer size " + str(4))
Cost after iteration 0: 0.693048
Cost after iteration 1000: 0.288083
Cost after iteration 2000: 0.254385
Cost after iteration 3000: 0.233864
Cost after iteration 4000: 0.226792
Cost after iteration 5000: 0.222644
Cost after iteration 6000: 0.219731
Cost after iteration 7000: 0.217504
Cost after iteration 8000: 0.219550
Cost after iteration 9000: 0.218633
Out[52]:
Text(0.5,1,'Decision Boundary for hidden layer size 4')
测试一下准确率:
predictions = predict(parameters, X)
print ('Accuracy: %d' % float((np.dot(Y,predictions.T) + np.dot(1-Y,1-predictions.T))/float(Y.size)*100) + '%')
结果90%已经是很高的准确率了:
Accuracy: 90%
下面是测试不同的隐藏层对准确度的影响:
Run the following code. It may take 1-2 minutes. You will observe different behaviors of the model for various hidden layer sizes.
plt.figure(figsize=(16, 32))
hidden_layer_sizes = [1, 2, 3, 4, 5, 10, 20]#不同的隐藏层节点数
for i, n_h in enumerate(hidden_layer_sizes):
plt.subplot(5, 2, i+1)
plt.title('Hidden Layer of size %d' % n_h)
parameters = nn_model(X, Y, n_h, num_iterations = 5000)
plot_decision_boundary(lambda x: predict(parameters, x.T), X, Y)
predictions = predict(parameters, X)
accuracy = float((np.dot(Y,predictions.T) + np.dot(1-Y,1-predictions.T))/float(Y.size)*100)
print ("Accuracy for {} hidden units: {} %".format(n_h, accuracy))
结果:
Accuracy for 1 hidden units: 67.5 %
Accuracy for 2 hidden units: 67.25 %
Accuracy for 3 hidden units: 90.75 %
Accuracy for 4 hidden units: 90.5 %
Accuracy for 5 hidden units: 91.25 %
Accuracy for 10 hidden units: 90.25 %
Accuracy for 20 hidden units: 90.5 %
Interpretation:
- The larger models (with more hidden units) are able to fit the training set better, until eventually the largest models overfit the data. #过大模型容易导致过拟合
- The best hidden layer size seems to be around n_h = 5. Indeed, a value around here seems to fits the data well without also incurring noticable overfitting.
- You will also learn later about regularization, which lets you use very large models (such as n_h = 50) without much overfitting.#后续通过正则化来改善过拟合问题
You’ve learnt to:
- Build a complete neural network with a hidden layer
- Make a good use of a non-linear unit
- Implemented forward propagation and backpropagation, and trained a neural network
- See the impact of varying the hidden layer size, including overfitting.
其他数据集上的表现
If you want, you can rerun the whole notebook (minus the dataset part) for each of the following datasets.
显示图形:
# Datasets
noisy_circles, noisy_moons, blobs, gaussian_quantiles, no_structure = load_extra_datasets()
datasets = {"noisy_circles": noisy_circles,
"noisy_moons": noisy_moons,
"blobs": blobs,
"gaussian_quantiles": gaussian_quantiles}
### START CODE HERE ### (choose your dataset)
dataset = "noisy_circles"
### END CODE HERE ###
X, Y = datasets[dataset]
X, Y = X.T, Y.reshape(1, Y.shape[0])
# make blobs binary
if dataset == "noisy_moons":
Y = Y%2
# Visualize the data
plt.scatter(X[0, :], X[1, :], c=np.squeeze(Y), s=40, cmap=plt.cm.Spectral);
测试:
parameters = nn_model(X, Y, n_h = 4, num_iterations = 10000, print_cost=True)
plot_decision_boundary(lambda x: predict(parameters, x.T), X, Y)
plt.title("Decision Boundary for hidden layer size " + str(4))
结果:
Cost after iteration 0: 0.693148
Cost after iteration 1000: 0.399812
Cost after iteration 2000: 0.398242
Cost after iteration 3000: 0.398877
Cost after iteration 4000: 0.399356
Cost after iteration 5000: 0.399656
Cost after iteration 6000: 0.399708
Cost after iteration 7000: 0.399696
Cost after iteration 8000: 0.399675
Cost after iteration 9000: 0.399649
Text(0.5,1,'Decision Boundary for hidden layer size 4')
其他的数据集可以分别测试一下,效果还是很好的。
Reference:
- http://scs.ryerson.ca/~aharley/neural-networks/
- http://cs231n.github.io/neural-networks-case-study/