PyTorch是一个建立在Torch库之上的Python包,是由Facebook开源的神经网络框架。它提供一种类似NumPy的抽象方法来表征张量(或多维数组),可利用GPU来加速训练。Torch是一个经典的对多维矩阵数据进行操作的张量(tensor )库,包含自动求导系统的深度神经网络,提供了高度灵活性和效率的深度学习实验性平台。与Tensorflow的静态计算图不同,PyTorch的计算图是动态的,可以根据计算需要实时改变计算图。

安装

1
2
3
4
# CPU版本的
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
# CUDA 11.1
pip3 install torch torchvision torchaudio

PyTorch常用工具包

https://pytorch.org/docs/stable/torch.html

  • torch: 类似于Numpy的通用数组库,可将张量类型转为torch.cuda.TensorFloat,并在GPU上进行计算。
  • torch.autograd: 用于构建计算图形并自动获取梯度的包。
  • torch.nn: 具有共享层和损失函数的神经网络库。
  • torch.optim: 具有通用优化算法(如SGD,Adam等)的优化包。
  • torch.utils: 数据载入器。具有训练器和其他便利功能;

  

张量

张量形状术语

  • 形状:张量的每个维度的长度(元素数量)。
  • :张量的维度数量。标量的秩为 0,向量的秩为 1,矩阵的秩为 2。
  • 维度:张量的一个特殊维度。
  • 大小:张量的总项数,即乘积形状向量。

张量创建

method desc
torch.tensor() 创建张量
torch.ones_like(x)
torch.zeros_like(x)
torch.rand_like(x)
创建一个与张量X具有相同维度的全1、全0或者是服从[0,1]区间上均匀分布的张量。
torch.normal(mean, std) 随机数生成张量,通过传入指定的均值张量和标准差张量,从而生成一个对应满足该分布的随机数张量。
torch.zeros(shape)
torch.ones(shape)
torch.eye(shape)
torch.full(shape, fill_value)
torch.empty(shape)
按照数值内容创建张量,可以通过指定shape来创建一个全0、全1、全为fill_value或是完全随机的一个张量。
torch.arange(start, end, step)
torch.linspace(start, end, step)
torch.logspace(start, end, step)
按照某种规则生成张量,可以通过指定start,end以及step参数来在某个范围内基于固定步长、等长间隔或对数间隔的张量。

与Numpy数据相互转化

method desc
torch.as_tensor(ndarray)
torch.from_numpy(ndarray)
将Numpy数组转化为PyTorch张量
tensor.numpy() 将PyTorch张量转化为Numpy数组

张量基本操作

索引与切片

遵循Python索引规则

  • 索引从 0 开始编制
  • 负索引表示按倒序编制索引
  • 冒号 : 用于切片 start:stop:step

广播

广播是从 NumPy 中的等效功能借用的一个概念。简而言之,在一定条件下,对一组张量执行组合运算时,为了适应大张量,会对小张量进行“扩展”。
广播主要发生在两种情况,一种是两个张量的维数不相等,但是它们的后缘维度的轴长相符,另外一种是有一方的长度为1。

数学运算

| func | desc | func | desc |
| :———— | :——————– | :—————- | :——————– |
| t.abs() | 绝对值 | torch.add(t, t) 或 + | 相加 |
| t.sum() | 返回tensor所有元素的和 | torch.log(t) | 对数 |
| t.ceil() | 向上取整 | torch.median(t) | 中位数 |
| t.floor() | 向下取整 | torch.mean(t) | 均值 |
| t.exp() | 指数 | torch.sqrt() | 开根号 |
| t.prod() | 返回tensor所有元素的积 | torch.sign() | 取符号 |
| t.pow() | 幂 | torch.mul() 或 * | 逐元素相乘 |
| torch.dot(t1,t2) | 计算一维张量的点积 | torch.mm(mat1,mat2)/matmul(mat1,mat2)/@ | 计算矩阵的乘法 |
| t.t() | 转置 | torch.cumprod(t,axis) | 在指定维度对t进行累积 |
| torch.cumsum() | 在指定维度对t进行累加 | torch.std/var/sum | 标准差/方差/和 |

自动求导

autograd模块

autograd包为对tensor进行自动求导,为实现对tensor自动求导,需考虑如下事项:

  1. 创建叶子节点(leaf node)的tensor,使用requires_grad参数指定是否记录对其的操作,以便之后利用backward()方法进行梯度求解。requires_grad参数缺省值为False,如果要对其求导需设置为True。
  2. 可利用requires_grad_()方法修改tensor的requires_grad属性。可以调用.detach()或with torch.no_grad():将不再计算张量的梯度,跟踪张量的历史记录。这点在评估模型、测试模型阶段常常使用。
  3. 通过运算创建的tensor(即非叶子节点),会自动被赋于grad_fn属性。该属性表示梯度函数。叶子节点的grad_fn为None。
  4. 最后得到的tensor执行backward()函数,此时自动计算各变量的梯度,并将累加结果保存grad属性中。计算完成后,非叶子节点的梯度自动释放。
  5. backward()函数接受参数,该参数应和调用backward()函数的Tensor的维度相同。如果求导的tensor为标量(即一个数字),backward中参数可省略。
  6. 反向传播的中间缓存会被清空,如果需要进行多次反向传播,需要指定backward中的参数retain_graph=True。多次反向传播时,梯度是累加的。
  7. 非叶子节点的梯度backward调用后即被清空。
  8. 可以通过用torch.no_grad()包裹代码块来阻止autograd去跟踪那些标记为.requesgrad=True的张量的历史记录。这步在测试阶段经常使用。
    整个过程中,Pytorch采用计算图的形式进行组织,该计算图为动态图,它的计算图在每次前向传播时,将重新构建。其他深度学习架构,如TensorFlow、Keras一般为静态图。

标量反向传播

假设x、w、b都是标量,z=wx+b,对标量z调用backward(),无需对backward()传入参数。

1
2
3
z.backward()
print(w.grad)
print(b.grad)

非标量反向传播

Pytorch有个简单的规定,不让张量(tensor)对张量求导,只允许标量对张量求导,因此,如果目标张量对一个非标量调用backward(),需要传入一个gradient参数,该参数也是张量,而且需要与调用backward()的张量形状相同。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
"""
非标量反向传播
"""
import torch
# 定义叶子节点张量x,y
a = torch.tensor([2, 3], dtype=torch.float, requires_grad=True)
b = torch.tensor([5, 6], dtype=torch.float, requires_grad=True)
# 定义f与a,b之间的映射关系:
f = a * b
# 反向传播
# 这里f是非标量张量所以我们需要把梯度参数传递给和张量f维数相同的反向传播函数
gradients = torch.ones_like(f)
f.backward(gradient=gradients)

print(a.grad) # tensor([5., 6.])
print(b.grad) # tensor([2., 3.])

tensorflow的forward只会根据第一次模型前向传播来构建一个静态的计算图, 后面的梯度自动求导都是根据这个计算图来计算的, 但是pytorch则不是, 它会为每次forward计算都构建一个动态图的计算图, 后续的每一次迭代都是使用一个新的计算图进行计算的.非常灵活,易调节

数据加载

  • torch.utils.data.Dataset: 实现对数据的加载。
  • torch.utils.data.TensorDataset: 对tensor进行打包
  • torch.utils.data.DataLoader: 数据加载器,结合了数据集和取样器,并且可以提供多个线程处理数据集。
1
torch.utils.data.DataLoader(dataset, batch_size=1, shuffle=False, num_workers=0, drop_last=False)

PyTorch神经网络

核心组件

  • 层:神经网络的基本结构,将输入张量转换为输出张量。
  • 模型:层构成的网络。
  • 损失函数:参数学习的目标函数,通过最小化损失函数来学习各种参数。
  • 优化器:如何使得损失函数最小,这就涉及到优化器。

PyTorch的nn模块

nn全称为neural network,意思是神经网络,是torch中构建神经网络的模块,导入为torch.nn

[https://pytorch.org/docs/stable/nn.html][https://pytorch.org/docs/stable/nn.html]

常用卷积层

https://pytorch.org/docs/stable/nn.html#convolution-layers

1
torch.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros', device=None, dtype=None)
  • in_channels (int) – Number of channels in the input image
  • out_channels (int) – Number of channels produced by the convolution
  • kernel_size (int or tuple) – Size of the convolving kernel
  • stride (int or tuple, optional) – Stride of the convolution. Default: 1
  • padding (int, tuple or str, optional) – Padding added to all four sides of the input. Default: 0
  • padding_mode (str, optional) – ‘zeros’, ‘reflect’, ‘replicate’ or ‘circular’. Default: ‘zeros’
  • dilation (int or tuple, optional) – Spacing between kernel elements. Default: 1
  • groups (int, optional) – Number of blocked connections from input channels to output channels. Default: 1
  • bias (bool, optional) – If True, adds a learnable bias to the output. Default: True
常用池化层

https://pytorch.org/docs/stable/nn.html#pooling-layers

1
torch.nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)
  • kernel_size (Union[int, Tuple[int, int]]) – the size of the window to take a max over
  • stride (Union[int, Tuple[int, int]]) – the stride of the window. Default value is kernel_size
  • padding (Union[int, Tuple[int, int]]) – Implicit negative infinity padding to be added on both sides
  • dilation (Union[int, Tuple[int, int]]) – a parameter that controls the stride of elements in the window
  • return_indices (bool) – if True, will return the max indices along with the outputs. Useful for torch.nn.MaxUnpool2d later
  • ceil_mode (bool) – when True, will use ceil instead of floor to compute the output shape
激活函数

https://pytorch.org/docs/stable/nn.html#non-linear-activations-weighted-sum-nonlinearity

1
2
3
4
5
6
7
8
# https://pytorch.org/docs/stable/generated/torch.nn.ReLU.html#torch.nn.ReLU
torch.nn.ReLU()
# https://pytorch.org/docs/stable/generated/torch.nn.Sigmoid.html#torch.nn.Sigmoid
torch.nn.Sigmoid()
# https://pytorch.org/docs/stable/generated/torch.nn.Softmax.html#torch.nn.Softmax
torch.nn.Softmax()
# https://pytorch.org/docs/stable/generated/torch.nn.Tanh.html#torch.nn.Tanh
torch.nn.Tanh()
损失函数

https://pytorch.org/docs/stable/nn.html#loss-functions

1
2
3
4
5
# 创建一个衡量输入`x`(`模型预测输出`)和目标`y`之间均方误差标准。常用于回归模型。
torch.nn.MSELoss(size_average=True)

# 交叉熵损失函数,又称对数似然损失函数,常用于分类模型。
torch.nn.CrossEntropyLoss(weight=None, size_average=True)
全连接层

https://pytorch.org/docs/stable/nn.html#linear-layers

1
2
3
4
5
6
7
# 线性变换
"""
in_features - 每个输入样本的大小
out_features - 每个输出样本的大小
bias - 若设置为False,这层不会学习偏置。默认值:True
"""
torch.nn.Linear(in_features, out_features, bias=True)
防止过拟合函数

https://pytorch.org/docs/stable/nn.html#normalization-layers

1
2
3
4
5
6
7
8
9
10
11
12
13
14
"""
对小批量(mini-batch)数据进行批标准化(Batch Normalization)操作

num_features: 来自期望输入的特征数
eps: 为保证数值稳定性(分母不能趋近或取0),给分母加上的值。默认为1e-5。
momentum: 动态均值和动态方差所使用的动量。默认为0.1。
affine: 一个布尔值,当设为true,给该层添加可学习的仿射变换参数。
"""
torch.nn.BatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True)

# 随机将输入张量中部分元素设置为0。对于每次前向调用,被置0的元素都是随机的。
# p - 将元素置0的概率。默认值:0.5
torch.nn.Dropout(p=0.5)

其他

https://pytorch.org/docs/stable/nn.html#transformer-layers

构建神经网络容器

自定义神经网络。构建网络层可以基于Module类,即torch.nn.Module,它是所有网络的基类。

nn.Module与nn.functional的区别:
nn中的层,

  • 一类是继承了nn.Module,其命名一般为nn.Xxx(第一个是大写),如nn.Linear、nn.Conv2d、nn.CrossEntropyLoss等。
  • 另一类是nn.functional中的函数,其名称一般为nn.functional.xxx,如nn.functional.linear、nn.functional.conv2d、nn.functional.cross_entropy等。

从功能来说两者相当,基于nn.Mudle能实现的层,使用nn.functional也可实现,反之亦然,而且性能方面两者也没有太大差异。但PyTorch官方推荐:具有学习参数的(例如,conv2d, linear, batch_norm)采用nn.Xxx方式。没有学习参数的(例如,maxpool, loss func, activation func)等根据个人选择使用nn.functional.xxx或者nn.Xxx方式。

PyTorch优化器

torch.optim.SGD

1
2
3
4
5
6
7
8
9

"""
可实现SGD优化算法,带动量SGD优化算法,带NAG(Nesterov accelerated gradient)动量SGD优化算法,并且均可拥有weight_decay项。

params(iterable)- 参数组,优化器要管理的那部分参数。
lr(float)- 初始学习率,可按需随着训练过程不断调整学习率。
momentum(float)- 动量,通常设置为0.9,0.8
"""
torch.optim.SGD(params, lr, momentum=0)

torch.optim.Adam(AMSGrad)

1
2
3
4
5

"""
Adam(Adaptive Moment Estimation)本质上是带有动量项的RMSprop,它利用梯度的一阶矩估计和二阶矩估计动态调整每个参数的学习率。Adam的优点主要在于经过偏置校正后,每一次迭代学习率都有个确定范围,使得参数比较平稳。
"""
torch.optim.Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08)

torch.optim.Adagrad

1
2
3
4
5

"""
实现Adagrad优化方法(Adaptive Gradient),Adagrad是一种自适应优化方法,是自适应的为各个参数分配不同的学习率。这个学习率的变化,会受到梯度的大小和迭代次数的影响。梯度越大,学习率越小;梯度越小,学习率越大。缺点是训练后期,学习率过小,因为Adagrad累加之前所有的梯度平方作为分母。AdaGrad算法是通过参数来调整合适的学习率λ,能独立地自动调整模型参数的学习率,对稀疏参数进行大幅更新和对频繁参数进行小幅更新。因此,Adagrad方法非常适合处理稀疏数据。AdaGrad算法在某些深度学习模型上效果不错。但还有些不足,可能因其累积梯度平方导致学习率过早或过量的减少所致。
"""
torch.optim.Adagrad(params, lr=0.01, lr_decay=0, weight_decay=0, initial_accumulator_value=0)

模型搭建步骤

项目一:CNN回归模型之波士顿房价预测

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
# 导包
from sklearn.preprocessing import MinMaxScaler
from torch import nn
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import torch
# 导入数据集
from sklearn.model_selection import train_test_split


# 数据加载
data_url = "http://lib.stat.cmu.edu/datasets/boston"
raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
data = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
target = raw_df.values[1::2, 2]


X = data # 特征提取
Y = target # 目标target提取
Y = Y.reshape(-1, 1) # 将y转换为一个列

# 数据归一化
ss = MinMaxScaler()
X = ss.fit_transform(X)

# 把x,y从ndarray格式转成tensor格式
X = torch.from_numpy(X).type(torch.FloatTensor)
Y = torch.from_numpy(Y).type(torch.FloatTensor)

# 切分数据集
train_x, test_x, train_y, test_y = train_test_split(X, Y, test_size=0.2)

# 构建网络
model = nn.Sequential(
nn.Linear(13, 16), # 13*16 输入train_x的维度是(404,13)
nn.ReLU(), # ReLU()层 输出的数据的维度(404,16)
nn.Linear(16, 1) # 再加一层全连接层 就输出y了 维度是(404,1)
)

# 构建优化器和损失函数
criterion = nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.08)

# 训练
max_epoch = 500
iter_loss = []
for i in range(max_epoch):
# 向前传播
y_pred = model(train_x)
# 计算loss
loss = criterion(y_pred, train_y)
if (i % 50 == 0):
print("第{}次迭代的loss是:{}".format(i, loss))

iter_loss.append(loss.item())
# 清空之前的梯度
optimizer.zero_grad()
# 反向传播
loss.backward()
# 权重调整
optimizer.step()

# 测试
output = model(test_x)
predict_list = output.detach().numpy()
print(predict_list[:10]) # 只输出10行

# 绘制不同的iteration的loss
x = np.arange(max_epoch)
y = np.array(iter_loss)
plt.figure()
plt.plot(x, y)
plt.title('loss')
plt.xlabel('nums of iter')
plt.ylabel('loss func')
plt.show()

# 查看真实值与预测值的散点图
x = np.arange(test_x.shape[0])
y1 = np.array(predict_list) # 测试集的预测值
y2 = np.array(test_y) # 测试集的实际值
line1 = plt.scatter(x, y1, c='blue')
line2 = plt.scatter(x, y2, c='red')
plt.legend([line1, line2], ['y_predict', 'y_true'])
plt.title("loss of true value and predict value")
plt.ylabel('boston hoursing price')
plt.show()

outputs:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
第0次迭代的loss是:596.5716552734375
第50次迭代的loss是:46.23166275024414
第100次迭代的loss是:24.163291931152344
第150次迭代的loss是:17.612567901611328
第200次迭代的loss是:15.434951782226562
第250次迭代的loss是:14.168197631835938
第300次迭代的loss是:13.293211936950684
第350次迭代的loss是:12.607232093811035
第400次迭代的loss是:12.067901611328125
第450次迭代的loss是:11.588809967041016
[[12.9814005]
[21.10761 ]
[16.77605 ]
[21.558886 ]
[21.683792 ]
[20.60617 ]
[20.92336 ]
[25.620314 ]
[15.320004 ]
[17.533672 ]]

项目二:MNIST手写字体识别

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123

import torch
import torch.nn as nn
import torchvision
import torch.nn.functional as F
import torch.utils.data as Data
from torch.autograd import Variable
import matplotlib.pyplot as plt


EPOCH = 3
BATCH_SIZE=50
LR = 0.001

root_dataset = './dataset'
train_data = torchvision.datasets.MNIST(
root=root_dataset,
train=True,
transform=torchvision.transforms.ToTensor(),
download=True
)

# 显式图片
# print(train_data.train_data.size())
# print(train_data.train_labels.size())

# plt.imshow(train_data.train_data[66].numpy(), cmap='Greys')
# plt.title("%i" % train_data.train_labels[66])
# plt.show()

# 加载训练数据
train_loader = Data.DataLoader(
dataset=train_data,
batch_size=BATCH_SIZE,
shuffle=True
)

# 加载测试集数据
test_data = torchvision.datasets.MNIST(
root=root_dataset,
train=False,
download=True
)

test_x = torch.unsqueeze(test_data.test_data, dim=1).type(torch.FloatTensor)[:2000]/255
test_y = test_data.test_labels[:2000]


class CNN(nn.Module):
def __init__(self):
super(CNN, self).__init__()
self.conv1 = nn.Sequential(
# 卷积 维度变换(1,28,28)--> (16,28,28)
nn.Conv2d(in_channels=1, # 输入信号通道
out_channels=16, # 输出信号通道
kernel_size=3, # 卷积核尺寸
stride=1, # 卷积步长
padding=1 # 每一条边补充0的层数
),
# 激活函数
nn.ReLU(),
# 最大池化
nn.MaxPool2d(kernel_size=2) # 维度变换(16,28,28)--> (16,14,14)
)

self.conv2 = nn.Sequential(
# 卷积 维度变换(16,14,14) --> (32,14,14)
nn.Conv2d(in_channels=16,
out_channels=32,
kernel_size=3,
stride=1,
padding=1
),
# 激活函数
nn.ReLU(),
# 最大池化
nn.MaxPool2d(kernel_size=2) # 维度变换(32,14,14) --> (32,7,7)
)

# 全连接
self.output = nn.Linear(32*7*7, 10)

def forward(self, x):
out = self.conv1(x)
out = self.conv2(out)
out = out.view(out.size(0), -1)
out = self.output(out)

return out


cnn = CNN()

# 定义优化器
optimizer = torch.optim.Adam(cnn.parameters(), lr=LR)

# 定义损失函数
loss_func = nn.CrossEntropyLoss()

# 开始训练
for epoch in range(EPOCH):
for step, (b_x, b_y) in enumerate(train_loader):
# 计算模型输出值
output = cnn(b_x)
# 计算损失函数
loss = loss_func(output, b_y)
# 优化器清零
optimizer.zero_grad()
# loss反向传播
loss.backward()
# 优化器优化
optimizer.step()

if step % 50 == 0:
test_output = cnn(test_x)
pred_y = torch.max(test_output, 1)[1].data.numpy()
accuracy = float((pred_y == test_y.data.numpy()).astype(int).sum()) / float(test_y.size(0))
print('epoch: {} step: {} train loss: {} test accuracy: {}'.format(epoch, step, loss.data.numpy(), accuracy))


# 保存模型
torch.save(cnn, './checkpoint/cnn_minist.pkl')
print('training finished')

outputs:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
epoch: 0 step: 0 train loss: 2.2989518642425537 test accuracy: 0.124
epoch: 0 step: 50 train loss: 0.6227200627326965 test accuracy: 0.761
epoch: 0 step: 100 train loss: 0.255100816488266 test accuracy: 0.873
epoch: 0 step: 150 train loss: 0.12506981194019318 test accuracy: 0.9035
epoch: 0 step: 200 train loss: 0.23605869710445404 test accuracy: 0.916
epoch: 0 step: 250 train loss: 0.10597402602434158 test accuracy: 0.9245
epoch: 0 step: 300 train loss: 0.19649431109428406 test accuracy: 0.9365
epoch: 0 step: 350 train loss: 0.09695011377334595 test accuracy: 0.945
epoch: 0 step: 400 train loss: 0.169553741812706 test accuracy: 0.9445
epoch: 0 step: 450 train loss: 0.07386567443609238 test accuracy: 0.956
epoch: 0 step: 500 train loss: 0.09177689254283905 test accuracy: 0.9525
epoch: 0 step: 550 train loss: 0.249346062541008 test accuracy: 0.957
epoch: 0 step: 600 train loss: 0.3329342305660248 test accuracy: 0.956
epoch: 0 step: 650 train loss: 0.2073971927165985 test accuracy: 0.9625
epoch: 0 step: 700 train loss: 0.07403621822595596 test accuracy: 0.964
epoch: 0 step: 750 train loss: 0.05281904339790344 test accuracy: 0.9615
epoch: 0 step: 800 train loss: 0.07417619973421097 test accuracy: 0.961
epoch: 0 step: 850 train loss: 0.1186814084649086 test accuracy: 0.9705
epoch: 0 step: 900 train loss: 0.11796750128269196 test accuracy: 0.965
epoch: 0 step: 950 train loss: 0.020582687109708786 test accuracy: 0.9715
epoch: 0 step: 1000 train loss: 0.028191855177283287 test accuracy: 0.9655
epoch: 0 step: 1050 train loss: 0.10790297389030457 test accuracy: 0.967
epoch: 0 step: 1100 train loss: 0.037335075438022614 test accuracy: 0.9635
epoch: 0 step: 1150 train loss: 0.1390017867088318 test accuracy: 0.9655
epoch: 1 step: 0 train loss: 0.07538899779319763 test accuracy: 0.9715
epoch: 1 step: 50 train loss: 0.06148580089211464 test accuracy: 0.9745
epoch: 1 step: 100 train loss: 0.059671998023986816 test accuracy: 0.9635
epoch: 1 step: 150 train loss: 0.0580081082880497 test accuracy: 0.9745
epoch: 1 step: 200 train loss: 0.012145531363785267 test accuracy: 0.97
epoch: 1 step: 250 train loss: 0.03478122875094414 test accuracy: 0.972
epoch: 1 step: 300 train loss: 0.03409402817487717 test accuracy: 0.9675
epoch: 1 step: 350 train loss: 0.03502879664301872 test accuracy: 0.9725
epoch: 1 step: 400 train loss: 0.03113013319671154 test accuracy: 0.978
epoch: 1 step: 450 train loss: 0.13314753770828247 test accuracy: 0.978
epoch: 1 step: 500 train loss: 0.018445264548063278 test accuracy: 0.97
epoch: 1 step: 550 train loss: 0.011954346671700478 test accuracy: 0.973
epoch: 1 step: 600 train loss: 0.0324656218290329 test accuracy: 0.9775
epoch: 1 step: 650 train loss: 0.046384457498788834 test accuracy: 0.9745
epoch: 1 step: 700 train loss: 0.05520062893629074 test accuracy: 0.979
epoch: 1 step: 750 train loss: 0.30722805857658386 test accuracy: 0.9765
epoch: 1 step: 800 train loss: 0.10514361411333084 test accuracy: 0.9765
epoch: 1 step: 850 train loss: 0.04414287954568863 test accuracy: 0.981
epoch: 1 step: 900 train loss: 0.1443830281496048 test accuracy: 0.978
epoch: 1 step: 950 train loss: 0.026100166141986847 test accuracy: 0.979
epoch: 1 step: 1000 train loss: 0.11013159900903702 test accuracy: 0.9795
epoch: 1 step: 1050 train loss: 0.10890169441699982 test accuracy: 0.977
epoch: 1 step: 1100 train loss: 0.07636130601167679 test accuracy: 0.98
epoch: 1 step: 1150 train loss: 0.12814483046531677 test accuracy: 0.9795
epoch: 2 step: 0 train loss: 0.15829549729824066 test accuracy: 0.977
epoch: 2 step: 50 train loss: 0.022714903578162193 test accuracy: 0.976
epoch: 2 step: 100 train loss: 0.11526823043823242 test accuracy: 0.98
epoch: 2 step: 150 train loss: 0.042159732431173325 test accuracy: 0.979
epoch: 2 step: 200 train loss: 0.10546272993087769 test accuracy: 0.979
epoch: 2 step: 250 train loss: 0.030654177069664 test accuracy: 0.98
epoch: 2 step: 300 train loss: 0.0656307265162468 test accuracy: 0.9795
epoch: 2 step: 350 train loss: 0.08443894982337952 test accuracy: 0.978
epoch: 2 step: 400 train loss: 0.035659801214933395 test accuracy: 0.976
epoch: 2 step: 450 train loss: 0.08835320174694061 test accuracy: 0.979
epoch: 2 step: 500 train loss: 0.02448669634759426 test accuracy: 0.976
epoch: 2 step: 550 train loss: 0.023640567436814308 test accuracy: 0.98
epoch: 2 step: 600 train loss: 0.0034627816639840603 test accuracy: 0.9775
epoch: 2 step: 650 train loss: 0.10121780633926392 test accuracy: 0.9785
epoch: 2 step: 700 train loss: 0.0032379289623349905 test accuracy: 0.9795
epoch: 2 step: 750 train loss: 0.053985223174095154 test accuracy: 0.9805
epoch: 2 step: 800 train loss: 0.014309514313936234 test accuracy: 0.9805
epoch: 2 step: 850 train loss: 0.030016331002116203 test accuracy: 0.9785
epoch: 2 step: 900 train loss: 0.08062826097011566 test accuracy: 0.983
epoch: 2 step: 950 train loss: 0.005085643380880356 test accuracy: 0.9755
epoch: 2 step: 1000 train loss: 0.046167097985744476 test accuracy: 0.9805
epoch: 2 step: 1050 train loss: 0.002461640862748027 test accuracy: 0.979
epoch: 2 step: 1100 train loss: 0.04258471354842186 test accuracy: 0.9835
epoch: 2 step: 1150 train loss: 0.0026853044982999563 test accuracy: 0.9805
training finished

可以看到准确率已经达到了98%。调整训练轮数、批次大小、学习率继续训练,发现准确率提升到98.4%。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
EPOCH = 20
BATCH_SIZE=100
LR = 0.0002

epoch: 0 step: 0 train loss: 2.3289575576782227 test accuracy: 0.073
epoch: 0 step: 50 train loss: 2.133652448654175 test accuracy: 0.596
epoch: 0 step: 100 train loss: 1.4860273599624634 test accuracy: 0.736
epoch: 0 step: 150 train loss: 0.8417261242866516 test accuracy: 0.7835
epoch: 0 step: 200 train loss: 0.6684989929199219 test accuracy: 0.821
epoch: 0 step: 250 train loss: 0.5492635369300842 test accuracy: 0.8485
epoch: 0 step: 300 train loss: 0.42627930641174316 test accuracy: 0.866
epoch: 0 step: 350 train loss: 0.2847326695919037 test accuracy: 0.876
epoch: 0 step: 400 train loss: 0.29851987957954407 test accuracy: 0.886
...............
epoch: 19 step: 0 train loss: 0.14779159426689148 test accuracy: 0.985
epoch: 19 step: 50 train loss: 0.0102002564817667 test accuracy: 0.9825
epoch: 19 step: 100 train loss: 0.04536750167608261 test accuracy: 0.984
epoch: 19 step: 150 train loss: 0.011938444338738918 test accuracy: 0.9825
epoch: 19 step: 200 train loss: 0.026133885607123375 test accuracy: 0.983
epoch: 19 step: 250 train loss: 0.00792818795889616 test accuracy: 0.9815
epoch: 19 step: 300 train loss: 0.07967539131641388 test accuracy: 0.985
epoch: 19 step: 350 train loss: 0.02291872352361679 test accuracy: 0.985
epoch: 19 step: 400 train loss: 0.02513641119003296 test accuracy: 0.9835
epoch: 19 step: 450 train loss: 0.06369250267744064 test accuracy: 0.9835
epoch: 19 step: 500 train loss: 0.017594920471310616 test accuracy: 0.9845
epoch: 19 step: 550 train loss: 0.020058415830135345 test accuracy: 0.984
training finished

模型使用:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
# 加载模型
cnn = torch.load('./checkpoint/cnn_minist.pkl')
# 测试模型
test_output = cnn(test_x[:20])

pred_y = torch.max(test_output, 1)[1].data.numpy()
print(pred_y, 'prediction number')
print(test_y[:20].numpy(), 'real number')

# 得出模型的准确度
test_output1 = cnn(test_x)
pred_y1 = torch.max(test_output1, 1)[1].data.numpy()
accuracy = float((pred_y1 == test_y.data.numpy()).astype(int).sum()) / float(test_y.size(0))
print('accuracy',accuracy)

"""
output:

[7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4] prediction number
[7 2 1 0 4 1 4 9 5 9 0 6 9 0 1 5 9 7 3 4] real number
accuracy 0.9855
"""