数据图片是这样的,对应的标签就是5
手势识别数据集地址下载后进行解压
!cd /home/aistudio/data/data23668 && unzip -qo Dataset.zip
!cd /home/aistudio/data/data23668/Dataset && rm -f */.DS_Store # 删除无关文件 `
解压后的目录结构如下,每个数字标签包含了相应图片
1、导入相关库
import os
import time
import os
import time
import random
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
import paddle
import paddle.fluid as fluid
import paddle.fluid.layers as layers
from multiprocessing import cpu_count
from paddle.fluid.dygraph import Pool2D,Conv2D
from paddle.fluid.dygraph import Linear,Sequential,BatchNorm, CosineDecay
2、将图片及其对应标枪的路径生成train_data.list和test_data.list
# 生成图像列表
data_path = '/home/aistudio/data/data23668/Dataset'
character_folders = os.listdir(data_path)
# print(character_folders)
if(os.path.exists('./train_data.list')):
os.remove('./train_data.list')
if(os.path.exists('./test_data.list')):
os.remove('./test_data.list')
for character_folder in character_folders:
with open('./train_data.list', 'a') as f_train:
with open('./test_data.list', 'a') as f_test:
if character_folder == '.DS_Store':
continue
character_imgs = os.listdir(os.path.join(data_path,character_folder))
count = 0
for img in character_imgs:
if img =='.DS_Store':
continue
if count%10 == 0: # 9:1比例划分训练集和测试集
f_test.write(os.path.join(data_path,character_folder,img) + '\t' + character_folder + '\n')
else:
f_train.write(os.path.join(data_path,character_folder,img) + '\t' + character_folder + '\n')
count +=1
print('列表已生成')
将生成的train_data.list的内容读出来我们会发现,每张图片的路径和标签独立成一行,方便后面
3、定义训练集和测试集的reader
# 定义训练集和测试集的reader
def data_mapper(sample):
img, label = sample
img = Image.open(img)
img = img.resize((100, 100), Image.ANTIALIAS) #resize(宽,高), 抗锯齿
img = np.array(img).astype('float32')
img = img.transpose((2, 0, 1)) # 通道,高,宽
img = img/255.0
return img, label
def data_reader(data_list_path):
def reader():
with open(data_list_path, 'r') as f:
lines = f.readlines()
for line in lines:
img, label = line.split('\t')
yield img, int(label)
return paddle.reader.xmap_readers(data_mapper, reader, cpu_count(), 512)
对np.tanspose的过程和作用不太清楚就查了下资料,做了个小测试,读入一张图片,对其transpose前后的数据及其shape分别打印出来,进行观察。
可以看到原来的图片读入方式rgb, rgb, rgb, …,转置之后,则是rrrr…, ggg…, bbb…。据介绍这种方式有利于提高模型准确率。
4、数据提供器
# 用于训练的数据提供器
train_reader = paddle.batch(reader=paddle.reader.shuffle(reader=data_reader('./train_data.list'), buf_size=4000), batch_size=32) #buf_size打乱缓冲器的的大小
# 用于测试的数据提供器
test_reader = paddle.batch(reader=data_reader('./test_data.list'), batch_size=32)
根据讨论区的讨论,增大buf_size有利于提高准确率,关于它的作用可以参考博客paddle.reader.shuffle()中buf_size的作用
5、定义网络
用一个简单的全连接网络,每个隐藏神经元都与每个input都进行连接
class MyDNN(fluid.dygraph.Layer):
def __init__(self):
super(MyDNN,self).__init__()
self.hidden1 = Linear(100,100,act='relu')
self.hidden2 = Linear(100,100,act='relu')
self.hidden3 = Linear(100,100,act='relu')
self.hidden4 = Linear(3*100*100,10,act='softmax')
def forward(self,input):
x = self.hidden1(input)
x = self.hidden2(x)
x = self.hidden3(x)
x = fluid.layers.reshape(x,shape=[-1,3*100*100]) #将3通道拉平
y = self.hidden4(x)
return y
6、模型训练
epoch = 60
#用动态图进行训练
with fluid.dygraph.guard(place = fluid.CPUPlace()):
model=MyDNN() #模型实例化
model.train() #训练模式
#opt=fluid.optimizer.SGDOptimizer(learning_rate=0.001, parameter_list=model.parameters())#优化器选用SGD随机梯度下降,学习率为0.001.
#opt = fluid.optimizer.Adam(learning_rate=0.01, parameter_list=model.parameters())
opt=fluid.optimizer.SGDOptimizer(
learning_rate=CosineDecay(0.0125, epoch, epoch), #余弦衰减
parameter_list=model.parameters())#优化器选用SGD随机梯度下降
epochs_num=epoch #迭代次数
for pass_num in range(epochs_num):
for batch_id,data in enumerate(train_reader()):
images=np.array([x[0].reshape(3,100,100) for x in data],np.float32)
labels = np.array([x[1] for x in data]).astype('int64')
#print(labels.shape)
labels = labels[:, np.newaxis]
#print(images.shape,'\n',labels.shape)
image=fluid.dygraph.to_variable(images)
label=fluid.dygraph.to_variable(labels)
predict=model(image)#预测
#print(predict.shape)
# print(predict)
loss=fluid.layers.cross_entropy(predict,label)
avg_loss=fluid.layers.mean(loss)#获取loss值
acc=fluid.layers.accuracy(predict,label)#计算精度
if batch_id!=0 and batch_id%50==0:
print("train_pass:{},batch_id:{},train_loss:{},train_acc:{}".format(pass_num,batch_id,avg_loss.numpy(),acc.numpy()))
avg_loss.backward()
opt.minimize(avg_loss)
model.clear_gradients()
fluid.save_dygraph(model.state_dict(),'MyDNN')#保存模型
关于Cosine_decay更详细的介绍Baidu飞桨API
运行结果如下,基本上39个epoch后准确率就在1附近了
7、测试集上验证
#模型校验
with fluid.dygraph.guard():
accs = []
model_dict, _ = fluid.load_dygraph('MyDNN')
model = MyDNN()
model.load_dict(model_dict) #加载模型参数
model.eval() #训练模式
for batch_id,data in enumerate(test_reader()):#测试集
images=np.array([x[0].reshape(3,100,100) for x in data],np.float32)
labels = np.array([x[1] for x in data]).astype('int64')
labels = labels[:, np.newaxis]
image=fluid.dygraph.to_variable(images)
label=fluid.dygraph.to_variable(labels)
predict=model(image)
acc=fluid.layers.accuracy(predict,label)
loss=fluid.layers.cross_entropy(predict,label)
avg_loss=fluid.layers.mean(loss)#获取loss值
if batch_id>=0:
print("batch_id:{},test_loss:{},test_acc:{}".format(batch_id,avg_loss.numpy(),acc.numpy()))
accs.append(acc.numpy()[0])
avg_acc = np.mean(accs)
print(avg_acc)
训练上结果如下(比我第一次跑高多了,难道是buf_size的效果)
8、测试一张图
#读取预测图像,进行预测
def load_image(path):
img = Image.open(path)
img = img.resize((100, 100), Image.ANTIALIAS)
img = np.array(img).astype('float32')
img = img.transpose((2, 0, 1))
img = img/255.0
print(img.shape)
return img
#构建预测动态图过程
with fluid.dygraph.guard():
infer_path = '手势.JPG'
model = MyDNN()
model_dict,_=fluid.load_dygraph('MyDNN')
model.load_dict(model_dict)#加载模型参数
model.eval()#评估模式
infer_img = load_image(infer_path)
infer_img=np.array(infer_img).astype('float32')
infer_img=infer_img[np.newaxis,:, : ,:]
infer_img = fluid.dygraph.to_variable(infer_img)
result=model(infer_img)
display(Image.open('手势.JPG'))
print(np.argmax(result.numpy()))
自己拍了张图片,测试了一下,却翻车了,误判为了0。。。难道是我手太黑了,哈哈哈。可能图片的角度,阴影也有影响吧,再查查资料看看
参考资料:
Baidu AI Studio
paddle.reader.shuffle()中buf_size的作用
cosine_decay官方API