本文是pytorch官方的一篇教程,加入其它学习过程中的一些东西,持续更新:https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html
主要记录一些在Pytorch 中构建神经网络的必要知识点。
Contents [hide]
神经网络的典型训练过程如下:
weights =weights - learning_rate * gradient
Pytorch中的Parameter 和Variable之间的比较
import torch
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# 1 input image channel, 6 output channels, 3x3 square convolution
# kernel
self.conv1 = nn.Conv2d(1, 6, 3)
self.conv2 = nn.Conv2d(6, 16, 3)
# an affine operation: y = Wx + b
self.fc1 = nn.Linear(16 * 6 * 6, 120) # 6*6 from image dimension
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
# Max pooling over a (2, 2) window
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
# If the size is a square you can only specify a single number
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
x = x.view(-1, self.num_flat_features(x))
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
def num_flat_features(self, x):
size = x.size()[1:] # all dimensions except the batch dimension
num_features = 1
for s in size:
num_features *= s
return num_features
net = Net()
print(net)
pytorch 的nn.Module 类似于keras 的 keras.Model。在初始化的时候定义层,在调用的时候构建网络。只不过keras.Model定义模型流程的时候是call()
,pytorch使用forward()
定义模型流程。
tf.keras.Model
当然两者在调用的时候都使用()调用,也就是模型的名称加上()调用,具体原因是由于Python类中的__init__()
和 __call__()
方法的调用方式。
Python类使用时用class_name()()
调用,第一个括号代表 __init__()
,第二个括号代表__call__()
。 nn.Module 的forward()
与keras.Model 的call()
都被__call__()调用了,也就是它们都重写了__call__()
方法。
来看一下nn.Module类的__call__()
方法:UNDO_yiyehu
def __call__(self, *input, **kwargs):
for hook in self._forward_pre_hooks.values():
result = hook(self, input)
if result is not None:
if not isinstance(result, tuple):
result = (result,)
input = result
if torch._C._get_tracing_state():
result = self._slow_forward(*input, **kwargs)
else:
result = self.forward(*input, **kwargs)
for hook in self._forward_hooks.values():
hook_result = hook(self, input, result)
if hook_result is not None:
result = hook_result
if len(self._backward_hooks) > 0:
var = result
while not isinstance(var, torch.Tensor):
if isinstance(var, dict):
var = next((v for v in var.values() if isinstance(v, torch.Tensor)))
else:
var = var[0]
grad_fn = var.grad_fn
if grad_fn is not None:
for hook in self._backward_hooks.values():
wrapper = functools.partial(hook, self)
functools.update_wrapper(wrapper, hook)
grad_fn.register_hook(wrapper)
return result
有两种方式: nn.Xxx
和 nn.functional.xxx
nn.Xxx
中定义的都是都继承于一个共同祖先Module的类, nn.functional.xxx
中定义的都是纯函数形式的操作。一些操作都是调用C++编写的函数进行计算的 ,如conv。
nn.Xxx
其实相当于对 nn.functional.xxx
的一个封装,就像是keras 相对于Keras后端一样,虽然现在TensorFlow有了自己的keras,tf. keras,也差不多这个意思了。
以conv2d操作举例,nn.functional.conv2d
的输入是(input, weight, self.bias, self.stride, self.padding, self.dilation, self.groups)
,可以看出 weights 和 bias 都需要手动传入。而 nn. Conv2d在__init__()
中初始化了weights 和bias (在_ConvNd类中初始化了weights 和bias,Conv2d继承自 _ConvNd , _ConvNd 继承自Module),并在 forward()
中调用了 nn.functional.conv2d
。
torch.nn.Module
关于 nn.Xxx
和 nn.functional.xxx
区别的相关博文:
https://www.jianshu.com/p/7bb495573cb9
https://www.zhihu.com/question/66782101
torch.nn
only supports mini-batches. The entiretorch.nn
package only supports inputs that are a mini-batch of samples, and not a single sample.
For example,nn.Conv2d
will take in a 4D Tensor ofnSamples x nChannels x Height x Width
.
If you have a single sample, just useinput.unsqueeze(0)
to add a fake batch dimension.
torch.nn的输入只支持 mini-batches. 也就是输入需要多一维用作samples。
https://pytorch.org/docs/stable/nn.html#l1loss
output = net(input)
target = torch.randn(10) # a dummy target, for example
target = target.view(1, -1) # make it the same shape as output
criterion = nn.MSELoss()
loss = criterion(output, target)
print(loss)
Now, if you follow loss
in the backward direction, using its .grad_fn
attribute, you will see a graph of computations that looks like this:
现在,如果使用它的.grad_fn属性,按照反向方向跟踪loss,将看到一个计算图
执行 loss.backward()
,整个图与关于loss 微分,图表中所有require_grad = True
的张量的grad
张量将累积其梯度。
pytorch在更新梯度的时候取得是梯度的均值,即除以 batch_size。当然,如果不使用net.zero_grad()
的话,每一次训练都会累积(accumulate)梯度。
这篇博文做了个实验:https://www.jb51.net/article/168006.htm
# Stochastic Gradient Descent (SGD)
learning_rate = 0.01
for f in net.parameters():
f.data.sub_(f.grad.data * learning_rate)
torch.optim
import torch.optim as optim
# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)
# in your training loop:
optimizer.zero_grad() # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()
optimizer.zero_grad()
,是为了清空weights 累积的梯度,每一个batch都需要清空前一个batch累积的梯度。