作用:regularize neural networks by randomly setting some features to zero during the forward pass.
1.dropout_forward
def dropout_forward(x, dropout_param):
p, mode = dropout_param['p'], dropout_param['mode']
if 'seed' in dropout_param:
np.random.seed(dropout_param['seed'])
mask = None
out = None
if mode == 'train':
mask = (np.random.rand(*x.shape) < p) / p
out = x * mask
elif mode == 'test':
out = x
cache = (dropout_param, mask)
out = out.astype(x.dtype, copy=False)
return out, cache
2.dropout_backward
def dropout_backward(dout, cache):
dropout_param, mask = cache
mode = dropout_param['mode']
dx = None
if mode == 'train':
dx = dout * mask
elif mode == 'test':
dx = dout
return dx
two-layer networks:one will use no dropout, and one will use a keep probability of 0.25
Inline Question 1:
What happens if we do not divide the values being passed through inverse dropout by p in the dropout layer? Why does that happen?
Answer:
在测试的时候,dropout 不做任何操作,所以输出的数学期望是其本身,但是在训练的时候,dropout 会改变输入和输出的数学期望,比如输入是x,以概率 p 保留,那么输出的数学期望就是 E(x̂ )=p∗x+(1−p)∗0=px ,因为我们希望在训练和测试的时候数学期望保持一致,那么在训练中forward的时候,需要除以p保证输入和输出的数学期望不变。
Inline Question 2:
Compare the validation and training accuracies with and without dropout -- what do your results suggest about dropout as a regularizer?
Answer:
在使用dropout的时候,训练的准确率会比不使用dropout的时候训练的准确率低,同时验证集的准确率会比不使用dropout的时候高一些,这说明了dropout可以作为regularization
Inline Question 3:
Suppose we are training a deep fully-connected network for image classification, with dropout after hidden layers (parameterized by keep probability p). How should we modify p, if at all, if we decide to decrease the size of the hidden layers (that is, the number of nodes in each layer)?
Answer:
如果我们需要减小hidden layers的尺寸,也就减小隐藏层的神经元个数,那么dropout的保留概率应该加大,比如考虑一个最极端的情况,我们将隐藏层的神经元个数减小到1,那么如果doprou的保留概率仍然特别小,那网络在大多数时候根本没有进行训练。
[1] Geoffrey E. Hinton et al, "Improving neural networks by preventing co-adaptation of feature detectors", arXiv 2012