Pytorch : neural network

本文是pytorch官方的一篇教程,加入其它学习过程中的一些东西,持续更新:https://pytorch.org/tutorials/beginner/blitz/neural_networks_tutorial.html

主要记录一些在Pytorch 中构建神经网络的必要知识点。

神经网络的典型训练过程如下:

  • 定义一些可学习的参数(或权重)的神经网络
  • 遍历输入数据集
  • 通过网络处理输入
  • 计算损失(输出与正确(ground truth, correct, target)的距离有多远)
  • 将梯度传播回网络参数
  • 通常使用简单的更新规则来更新网络的权重:
    weights =weights - learning_rate * gradient

定义一些可学习的参数(或权重)的神经网络

定义网络

import torch
import torch.nn as nn
import torch.nn.functional as F


class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 3x3 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 3)
        self.conv2 = nn.Conv2d(6, 16, 3)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 6 * 6, 120)  # 6*6 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

net = Net()
print(net)

pytorch 的nn.Module 类似于keras 的 keras.Model。在初始化的时候定义层,在调用的时候构建网络。只不过keras.Model定义模型流程的时候是call(),pytorch使用forward()定义模型流程。

当然两者在调用的时候都使用()调用,也就是模型的名称加上()调用,具体原因是由于Python类中的__init__() __call__() 方法的调用方式。
Python类使用时用class_name()() 调用,第一个括号代表 __init__() ,第二个括号代表__call__() 。 nn.Module 的forward()与keras.Model 的call() 都被__call__()调用了,也就是它们都重写了__call__()方法。

来看一下nn.Module类的__call__()方法:UNDO_yiyehu

def __call__(self, *input, **kwargs):
    for hook in self._forward_pre_hooks.values():
        result = hook(self, input)
        if result is not None:
            if not isinstance(result, tuple):
                result = (result,)
            input = result
    if torch._C._get_tracing_state():
        result = self._slow_forward(*input, **kwargs)
    else:
        result = self.forward(*input, **kwargs)
    for hook in self._forward_hooks.values():
        hook_result = hook(self, input, result)
        if hook_result is not None:
            result = hook_result
    if len(self._backward_hooks) > 0:
        var = result
        while not isinstance(var, torch.Tensor):
            if isinstance(var, dict):
                var = next((v for v in var.values() if isinstance(v, torch.Tensor)))
            else:
                var = var[0]
        grad_fn = var.grad_fn
        if grad_fn is not None:
            for hook in self._backward_hooks.values():
                wrapper = functools.partial(hook, self)
                functools.update_wrapper(wrapper, hook)
                grad_fn.register_hook(wrapper)
    return result

关于pytorch的一些定义好的层 : 卷积、池化、激活、归一化,dropout等

有两种方式: nn.Xxxnn.functional.xxx

nn.Xxx中定义的都是都继承于一个共同祖先Module的类, nn.functional.xxx中定义的都是纯函数形式的操作。一些操作都是调用C++编写的函数进行计算的 ,如conv。

nn.Xxx 其实相当于对 nn.functional.xxx 的一个封装,就像是keras 相对于Keras后端一样,虽然现在TensorFlow有了自己的keras,tf. keras,也差不多这个意思了。

以conv2d操作举例,nn.functional.conv2d的输入是(input, weight, self.bias, self.stride, self.padding, self.dilation, self.groups) ,可以看出 weights 和 bias 都需要手动传入。而 nn. Conv2d在__init__()中初始化了weights 和bias (在_ConvNd类中初始化了weights 和bias,Conv2d继承自 _ConvNd , _ConvNd 继承自Module),并在 forward()中调用了 nn.functional.conv2d

关于 nn.Xxxnn.functional.xxx 区别的相关博文:

https://www.jianshu.com/p/7bb495573cb9
https://www.zhihu.com/question/66782101

通过网络处理输入

torch.nn only supports mini-batches. The entire torch.nn package only supports inputs that are a mini-batch of samples, and not a single sample.

For example, nn.Conv2d will take in a 4D Tensor of nSamples x nChannels x Height x Width.

If you have a single sample, just use input.unsqueeze(0) to add a fake batch dimension.

torch.nn的输入只支持 mini-batches. 也就是输入需要多一维用作samples。

计算损失

output = net(input)
target = torch.randn(10)  # a dummy target, for example
target = target.view(1, -1)  # make it the same shape as output
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)

Now, if you follow loss in the backward direction, using its .grad_fn attribute, you will see a graph of computations that looks like this:

现在,如果使用它的.grad_fn属性,按照反向方向跟踪loss,将看到一个计算图

将梯度传播回网络参数

执行 loss.backward() ,整个图与关于loss 微分,图表中所有require_grad = True的张量的grad张量将累积其梯度。

梯度更新

pytorch在更新梯度的时候取得是梯度的均值,即除以 batch_size。当然,如果不使用net.zero_grad()的话,每一次训练都会累积(accumulate)梯度。
这篇博文做了个实验:https://www.jb51.net/article/168006.htm

使用简单的更新规则来更新网络的权重

# Stochastic Gradient Descent (SGD) 
learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)

使用其他梯度下降的方法 torch.optim

import torch.optim as optim

# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step() 

optimizer.zero_grad(),是为了清空weights 累积的梯度,每一个batch都需要清空前一个batch累积的梯度。

2 thoughts on “Pytorch : neural network

  1. Do you mind if I quote a couple of your articles as long
    as I provide credit and sources back to your site?
    My blog site is in the very same niche as yours and my visitors would definitely benefit from a lot of the information you present here.
    Please let me know if this okay with you. Thanks!

Comments are closed.