基于MNIST数据库手写数字识别

最近在研究机器学习,只看视频和教材太累,感觉有点形而上学。于是就把吴恩达教授教学视频的课后作业关于手写数字识别的作业在pytorch框架下实现了一遍。

关于手写数字的识别问题,实际上是一个传统的利用神经网络去解决问题的一个例子。初步了解pytorch之后,搭建了一个MLP网络去实现,具体步骤如下:

  1. 构建一个三层网络,确定输入层、隐藏层以及输出层维数;
  2. 前向传播,自己编写forward函数,主要就涉及一些矩阵运算;
  3. 定义loss function,这里采用交叉熵函数;
  4. 反向传播,更新权值(利用内置torch.optim.Adam函数);
  5. 训练网络并测试。

代码如下:

from argparse import Namespace
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scipy.io as sio
import torch
import random

# Arguments
args = Namespace(
    seed=1234,
    #num_samples_per_class=500,
    dimensions=784,
    num_classes=10,
    train_size=0.75,
    test_size=0.25,
    num_hidden_units=30,
    learning_rate=4e-2,
    regularization=1e-3,
    num_epochs=200,
    #dropout_p=0.1,
)

# Set seed for reproducability
np.random.seed(args.seed)

#导入matlab文件.mat
#用'/'代替'\'避免'\'在python中被解释成转义字符而出现错误
matdata='C:/Users/HIT/Documents/MATLAB/data.mat'
data=sio.loadmat(matdata)

#转换成numpy形式
X1=np.transpose(data['image'])
X=X1[:5000,:]
y1=np.array(data['label'])
y=y1[:5000,:]

#将数据从向量转换成张量
X = torch.from_numpy(X).float()
y = torch.from_numpy(y).long()
#len默认计算行的长度
print(len(X))
print(len(y))
#X is 60000*784,每一行代表一张图片
#y is 60000*1,每一行代表图片的标签
#从0-9共10类标签
#打乱数据
shuffle_indicies = torch.LongTensor(random.sample(range(0, len(X)), len(X)))
# print(shuffle_indicies)
X = X[shuffle_indicies]
y = y[shuffle_indicies]

# Split datasets 将数据集分割成训练集和测试集
test_start_idx = int(len(X) * args.train_size)
X_train = X[:test_start_idx]
y_train1 = y[:test_start_idx]
y_train=y_train1.squeeze() #压缩多余的维度
X_test = X[test_start_idx:]
y_test1 = y[test_start_idx:]
y_test=y_test1.squeeze()
print("We have %i train samples and %i test samples." % (len(X_train), len(X_test)))

#导入pytorch
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

class MLP(nn.Module):
    #初始化网络结构
    def __init__(self, input_dim, hidden_dim, output_dim):
        super(MLP, self).__init__()
        self.fc1 = nn.Linear(input_dim, hidden_dim) #first layer 计算z1=X*W1
        self.fc2 = nn.Linear(hidden_dim, output_dim) #second layer 计算z2=a_1*W2

    def init_weights(self):
        #xavier正态分布 初始化权值
        nn.init.xavier_normal_(self.fc1.weight, gain=nn.init.calculate_gain('relu'))

    #前向传播
    def forward(self, x_in, apply_softmax=False):
        a_1 = F.relu(self.fc1(x_in)) # activaton function added! a_1=f(z1)
        #a_1 = self.dropout(a_1)  #dropping some neurons
        y_pred = self.fc2(a_1) #y_pred=f(z2)

        if apply_softmax:
            y_pred = F.softmax(y_pred, dim=1) #y_pred=softmax(y_pred)
                                              #利用softmax函数将其映射到0,1之间

        return y_pred

# Initialize model
model = MLP(input_dim=args.dimensions,
            hidden_dim=args.num_hidden_units,
            output_dim=args.num_classes)
print (model.named_modules)

# Accuracy
def get_accuracy(y_pred, y_target):
    #计算出y_pred=y_target的次数
    n_correct = torch.eq(y_pred, y_target).sum().item()
    accuracy = n_correct / len(y_pred) * 100
    return accuracy

# Optimization
loss_fn = nn.CrossEntropyLoss()  #定义交叉熵作为loss函数
optimizer = optim.Adam(model.parameters(), lr=args.learning_rate,weight_decay=args.regularization)

# Training
for t in range(args.num_epochs):
    # Forward pass
    y_pred = model(X_train)

    # Accuracy
    _, predictions = y_pred.max(dim=1)
    accuracy = get_accuracy(y_pred=predictions.long(), y_target=y_train)

    # Loss
    loss = loss_fn(y_pred, y_train)

    # Verbose
    if t%20==0:
        print ("epoch: {0:02d} | loss: {1:.4f} | acc: {2:.1f}%".format(
            t, loss, accuracy))
        print(predictions)
        print(y_train)
    # Zero all gradients
    #梯度清零,防止积累导致错误
    optimizer.zero_grad()

    # Backward pass 反向传播
    loss.backward()

    # Update weights 更新权值
    optimizer.step()

# Predictions
_, pred_train = model(X_train, apply_softmax=True).max(dim=1)
_, pred_test = model(X_test, apply_softmax=True).max(dim=1)

# Train and test accuracies
train_acc = get_accuracy(y_pred=pred_train, y_target=y_train)
test_acc = get_accuracy(y_pred=pred_test, y_target=y_test)
print ("train acc: {0:.1f}%, test acc: {1:.1f}%".format(train_acc, test_acc))

网络训练之后,训练集精度能够达到98%左右,测试集精度在93%左右。虽然在程序中运用了正则化方法来解决过拟合问题,不过好像并没有得到很好的效果。后续会采用不同方法来调试模型,第一次写博客,水平有限,望大佬们不吝赐教!

 

你可能感兴趣的:(基于MNIST数据库手写数字识别)