使用SppLayer 实现花的分类

使用SppLayer 实现花的分类

目录

  • 使用SppLayer 实现花的分类
  • 前言
  • 一、Flatten,GlobalAveragePooling2D和SpatialPyramidPooling
    • 1.Flatten
    • 2.GlobalAveragePooling2D
    • 3.SpatialPyramidPooling
  • 二、Keras实现
    • 1.注意事项
    • 2.实现过程
  • 总结
  • 参考


前言

五一假期的第二篇博客,这篇文章我会总结一下卷积层和全连接层之间的“文章”,基于框架TensorFlow和Keras 去比较Flatten,GlobalAveragePooling2D和SPP(SpatialPyramidPooling 空间金字塔池化),并用SppLayer 实现了一个小例子。我刚开始学图像分类,所以所有都是基于图像分类的讲述,文章中提到的三维块是 以(HWC,(长,宽,通道))为准,不恰当的地方还请各位大佬指正。下面附上资源链接:

内容 链接
数据集 链接
kaggle上实现花分类例子 链接
Flatten,GlobalAveragePooling2D,SPP 链接
网盘 链接 提取码:qp09

一、Flatten,GlobalAveragePooling2D和SpatialPyramidPooling

1.Flatten

Flatten的原理很简单,就是强行把卷积后形成的三维(不包含batch_size)块一维化,如下图所示:
使用SppLayer 实现花的分类_第1张图片
假设我们通过多层卷积池化卷积池化后得到7x7x5的三维块,那么通过flatten之后就变成了7x7x5=245的一维向量,再连接一个和分类类别数量一致的全连接层,再添加softmax函数,这样一个分类网络就搭建好了。那么这样做的缺点是什么,假设我卷积后得到的块比较大,例如我试验中,输入大小维224x224x3的图片,用到vgg16(不包含全连接层)做前面的卷积运算,如下所示:

Model: "model_flatten"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
vgg16 (Functional)           (None, 7, 7, 512)         14714688  
_________________________________________________________________
flatten (Flatten)            (None, 25088)             0         
_________________________________________________________________
dense (Dense)                (None, 10)                250890    
=================================================================
Total params: 14,965,578
Trainable params: 14,965,578
Non-trainable params: 0
_________________________________________________________________

卷积后的三维块一维化之后得到了7x7x512=25088的一维向量,再连接接一个10个节点的全连接成做10种分类,那么这一层需要训练的参数就是7x7x512x10+10=250890, 其中加的10是偏执(bias),那假如我要做1000种分类呢?那就是25088x1000+1000=25,089,000。参数太多了。还有一个缺点是,因为参数和图片的长宽有关,所以输入网络必须要指定图片大小。

2.GlobalAveragePooling2D

GlobalAveragePooling2D是平均池化的一个特例,不需要指定pool_size和strides等参数,所求得到得就是每个通道的一个平均值,可以看出 它的计算和通道数有关。不多说,我们直接上图:
使用SppLayer 实现花的分类_第2张图片
如果你最后的分类刚好是五分类的话,那么很好,不用连接全连接层,就可以通过softmax函数计算,直接去分类了。如果这个通道数和你分类的类别不一样,那么你可以再连接全连接层,然后再去分类,这样的话,参数就会减少很多很多。。同样已vgg16做卷积运算,再连接全局平均池化(GlobalAveragePooling2D)。

Model: "model_global_avg_pooling"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
vgg16 (Functional)           (None, None, None, 512)   14714688  
_________________________________________________________________
global_average_pooling2d (Gl (None, 512)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                5130      
=================================================================
Total params: 14,719,818
Trainable params: 14,719,818
Non-trainable params: 0
_________________________________________________________________

这里需要注意的是,我不用再限定图片的大小了!!通过vgg16卷积运算完成后,得到的是512通道的块,经过GlobalAveragePooling2D后,他就变成512的一维向量了,再连接全连接层,假设10种分类的全连接层,那么这其中要训练的参数就位512x10+10=5130。是不是相比上面的flatten产生的250890个参数要少很多了。

3.SpatialPyramidPooling

SpatialPyramidPooling ,空间金字塔池化,因为一开始我们输入卷积网络的图片需要指定其大小。假如图片和输入图片大小不一致,我们一般都是通过裁剪或变形的方法,让图片能输入到我们的网络。流程如下:

image
crop/warp
conv layers
fc layers
output

但是这样做的坏处就是,也显而易见了。
使用SppLayer 实现花的分类_第3张图片
这让我们要识别目标看起来不像目标。所以引入了空间金字塔结构。如下图所示:
使用SppLayer 实现花的分类_第4张图片
假设我经过卷积池化卷积池化后得到一组大小为HXWX256的特征图,分别用4x4,2x2,1*1的块去做最大池化,再将得到的16x256-d,4x256d,1x256-d,组合成一个向量,作为输出,这样的话全连接层也跟图片的大小没有关系了!关注的是通道数。具体操作流程如下:

image
conv layers
spatial pyramid pooling
fc layers
output

二、Keras实现

1.注意事项

说了这么多,我们还是来看看代码把,博主能力有限,没能去手撸SppLayer的代码,参考的是GitHub上大佬的代码(连接),他使用的Keras版本比较低,所以我在这里做了一下简单的修改,使其在2.4版本上可用。Flatten和GlobalAveragePooling2D keras中都有实现,感兴趣可以去看一下源码。

2.实现过程

  • 首先是图片读取和数据可视化函数,上篇博客使用到了,这里总结一下,方便以后复用。
#首先定义两个工具函数吧,一个是可视化数据,一个是图片分类文件读取,还有一个是SPPLayer
import cv2
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
import os
def visual_train_data(train_path,classes): #数据可视化
    """

    :param train_path: 训练数据路径
    :param classes: 标签字典 如classes = { 0:'Speed limit (20km/h)',
            1:'Speed limit (30km/h)',
            2:'Speed limit (50km/h)',
            3:'Speed limit (60km/h)',
            4:'Speed limit (70km/h)'}
    :return: None
    """
    print(classes)
    floders = os.listdir(train_path)
    train_num = []
    class_num = []
    index=0
    for floder in floders:
        print(floder)
        if floder=='flowers': #这里的代码是我在使用在线数据集的时候,读取数据会多出现一个flowers的文件夹,这里要给他踢掉
            continue
        train_files = os.listdir(train_path + '/' + floder)
        train_num.append(len(train_files))
        class_num.append(classes[index])
        index+=1
    zipped_lists = zip(train_num, class_num)
    sorted_pair = sorted(zipped_lists)
    tuples = zip(*sorted_pair)  # 这里是解压
    train_num, class_num = [list(tuple) for tuple in tuples]
    plt.figure(figsize=(21, 10))
    plt.bar(class_num, train_num)
    plt.xticks(class_num, rotation='vertical')
    plt.show()
    return train_num,class_num
#首先定义两个工具函数吧,一个是可视化数据,一个是图片分类文件读取,还有一个是SPPLayer
import cv2
from PIL import Image
import numpy as np
import matplotlib.pyplot as plt
import os
def visual_train_data(train_path,classes): #数据可视化
    """

    :param train_path: 训练数据路径
    :param classes: 标签字典 如classes = { 0:'Speed limit (20km/h)',
            1:'Speed limit (30km/h)',
            2:'Speed limit (50km/h)',
            3:'Speed limit (60km/h)',
            4:'Speed limit (70km/h)'}
    :return: None
    """
    print(classes)
    floders = os.listdir(train_path)
    train_num = []
    class_num = []
    index=0
    for floder in floders:
        print(floder)
        if floder=='flowers':
            continue
        train_files = os.listdir(train_path + '/' + floder)
        train_num.append(len(train_files))
        class_num.append(classes[index])
        index+=1
    zipped_lists = zip(train_num, class_num)
    sorted_pair = sorted(zipped_lists)
    tuples = zip(*sorted_pair)  # 这里是解压
    train_num, class_num = [list(tuple) for tuple in tuples]
    plt.figure(figsize=(21, 10))
    plt.bar(class_num, train_num)
    plt.xticks(class_num, rotation='vertical')
    plt.show()
    return train_num,class_num


def load_train_data(train_data_dir,imgage_shape=(None,None,3)):
    """
    :param train_data_dir:  训练集路径
    :param imgage_shape:  图片的长款及通道
    :return:  image_data(np.array) 图片数据 image_labels 数据标签
    """
    IMG_HEIGHT=imgage_shape[0]
    IMG_WEIGHT=imgage_shape[1]
    img_resize=None
    floders = os.listdir(train_data_dir)
    image_data = []  # 用于保存分类
    image_labels = []  # 用于保存标签
    type_dict = {
     }  # 下表和所属类别对应
    index = -1  # 用于字典下表和标签
    for floder in floders:
        path = train_data_dir + '/' + floder
        if floder=='flowers':
            continue
        print('loading ' + path)
        index += 1  # 从0开始编号
        type_dict[index] = floder.split('-')[-1]
        images = os.listdir(path)
        for img in images:
            try:  # 加入异常判断 防止读取德时候 出错
                image = cv2.imread(path + '/' + img)
                if IMG_WEIGHT is not None and IMG_HEIGHT is not None:
                    img_resize = cv2.resize(image,(IMG_WEIGHT,IMG_HEIGHT))
                else:
                    img_resize = image
                image_data.append(img_resize)
                image_labels.append(index)
                
            except Exception as err:
                print(err)
                print('Error in ' + img)
    image_data = np.array(image_data, np.float32)
    image_labels = np.array(image_labels, np.int)
    print("loading finished")
    print("image_data shape ",image_data.shape)
    print("image_labels shape",image_labels.shape)
    return image_data,image_labels,type_dict
  • 接下来我们去调用,然后再去训练,
#数据加载
train_data_dir='../input/flowers-recognition/flowers'
image_data,image_labels,dict_type=load_train_data(train_data_dir,(224,224,3))
print(dict_type)
visual_train_data(train_data_dir,dict_type) #数据数量统计

统计出来大概是这样一张图:
使用SppLayer 实现花的分类_第5张图片

  • 这里是修改之后的SppLayer代码:
#SppLayer 网络结构定义哦 GitHub提供 我自己没有写哦
import tensorflow as tf
# from keras.engine.topology import Layer
from tensorflow.python.keras.layers import Layer
import tensorflow.keras.backend as K


class SpatialPyramidPooling(Layer):
    """Spatial pyramid pooling layer for 2D inputs.
    See Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition,
    K. He, X. Zhang, S. Ren, J. Sun
    # Arguments
        pool_list: list of int
            List of pooling regions to use. The length of the list is the number of pooling regions,
            each int in the list is the number of regions in that pool. For example [1,2,4] would be 3
            regions with 1, 2x2 and 4x4 max pools, so 21 outputs per feature map
    # Input shape
        4D tensor with shape:
        `(samples, channels, rows, cols)` if dim_ordering='th'
        or 4D tensor with shape:
        `(samples, rows, cols, channels)` if dim_ordering='tf'.
    # Output shape
        2D tensor with shape:
        `(samples, channels * sum([i * i for i in pool_list])`
    """

    def __init__(self, pool_list, **kwargs):

        self.dim_ordering = K.image_data_format()
        assert self.dim_ordering in {
     'channels_first', 'channels_last'}, 'dim_ordering must be in {channels_first, channels_last}'

        self.pool_list = pool_list

        self.num_outputs_per_channel = sum([i * i for i in pool_list])

        super(SpatialPyramidPooling, self).__init__(**kwargs)

    def build(self, input_shape):
        if self.dim_ordering == 'channels_first':
            self.nb_channels = input_shape[1]
        elif self.dim_ordering == 'channels_last':
            self.nb_channels = input_shape[3]

    def compute_output_shape(self, input_shape):
        return (input_shape[0], self.nb_channels * self.num_outputs_per_channel)

    def get_config(self):
        config = {
     'pool_list': self.pool_list}
        base_config = super(SpatialPyramidPooling, self).get_config()
        return dict(list(base_config.items()) + list(config.items()))

    def call(self, x, mask=None):

        input_shape = K.shape(x)

        if self.dim_ordering == 'channels_first':
            num_rows = input_shape[2]
            num_cols = input_shape[3]
        elif self.dim_ordering == 'channels_last':
            num_rows = input_shape[1]
            num_cols = input_shape[2]

        row_length = [K.cast(num_rows, 'float32') / i for i in self.pool_list]
        col_length = [K.cast(num_cols, 'float32') / i for i in self.pool_list]

        outputs = []

        if self.dim_ordering == 'channels_first':
            for pool_num, num_pool_regions in enumerate(self.pool_list):
                for jy in range(num_pool_regions):
                    for ix in range(num_pool_regions):
                        x1 = ix * col_length[pool_num]
                        x2 = ix * col_length[pool_num] + col_length[pool_num]
                        y1 = jy * row_length[pool_num]
                        y2 = jy * row_length[pool_num] + row_length[pool_num]

                        x1 = K.cast(K.round(x1), 'int32')
                        x2 = K.cast(K.round(x2), 'int32')
                        y1 = K.cast(K.round(y1), 'int32')
                        y2 = K.cast(K.round(y2), 'int32')
                        new_shape = [input_shape[0], input_shape[1],
                                     y2 - y1, x2 - x1]
                        x_crop = x[:, :, y1:y2, x1:x2]
                        xm = K.reshape(x_crop, new_shape)
                        pooled_val = K.max(xm, axis=(2, 3))
                        outputs.append(pooled_val)

        elif self.dim_ordering == 'channels_last':
            for pool_num, num_pool_regions in enumerate(self.pool_list):
                for jy in range(num_pool_regions):
                    for ix in range(num_pool_regions):
                        x1 = ix * col_length[pool_num]
                        x2 = ix * col_length[pool_num] + col_length[pool_num]
                        y1 = jy * row_length[pool_num]
                        y2 = jy * row_length[pool_num] + row_length[pool_num]

                        x1 = K.cast(K.round(x1), 'int32')
                        x2 = K.cast(K.round(x2), 'int32')
                        y1 = K.cast(K.round(y1), 'int32')
                        y2 = K.cast(K.round(y2), 'int32')

                        new_shape = [input_shape[0], y2 - y1,
                                     x2 - x1, input_shape[3]]

                        x_crop = x[:, y1:y2, x1:x2, :]
                        xm = K.reshape(x_crop, new_shape)
                        pooled_val = K.max(xm, axis=(1, 2))
                        outputs.append(pooled_val)

        if self.dim_ordering == 'channels_first':
            outputs = K.concatenate(outputs)
        elif self.dim_ordering == 'channels_last':
            #outputs = K.concatenate(outputs,axis = 1)
            outputs = K.concatenate(outputs)
            # outputs = K.reshape(outputs,(len(self.pool_list),self.num_outputs_per_channel,input_shape[0],input_shape[1]))
            #outputs = K.permute_dimensions(outputs,(3,1,0,2))
            outputs = K.reshape(outputs,(input_shape[0], self.num_outputs_per_channel * self.nb_channels))

        return outputs

主要修改的地方如下表:

源码 修改
K.image_dim_ordering() K.image_data_format()
th channels_first
tf channels_last

  • 构建自己的模型,由于本次只是一个学习案例,数据集也没有提供测试集,没有去要求其准确率。
#定义网络结构
import tensorflow as tf
from tensorflow.keras.layers import Conv2D, Flatten, Lambda, MaxPooling2D, Dropout, Input, Dense,ZeroPadding2D,BatchNormalization
from tensorflow.python.keras import backend
from tensorflow.python.keras.engine import training
from tensorflow.python.keras.utils import layer_utils
from tensorflow.keras import optimizers, losses, initializers
def Spp_test_model(input_shape=(None, None, 3), input_tensor=None,classes=5):
    if input_tensor is None:
        img_input = Input(shape=input_shape)
    else:
        if not backend.is_keras_tensor(input_tensor):
            img_input = Input(tensor=input_tensor, shape=input_shape)
        else:
            img_input = input_tensor
    #搭建网络模型结构,一开始我并不限定输入大小
    #第一个块
    x = Conv2D(filters=32, kernel_size=(3, 3), strides=1, padding='valid', name='conv_block_1', activation='relu')(
        img_input)
    x = MaxPooling2D(pool_size=(2, 2), strides=1, name='max_pooling_1')(x)
    x=BatchNormalization()(x)
    #第二个块
    x = Conv2D(filters=64, kernel_size=(3, 3), strides=1, padding='same', name='conv_block_2', activation='relu')(
        x)
    x = MaxPooling2D(pool_size=(2, 2), strides=1, name='max_pooling_2')(x)
    x=BatchNormalization()(x)
    #第三个块
    x = Conv2D(filters=32, kernel_size=(3, 3), strides=1, padding='same', name='conv_block_3', activation='relu')(
        x)
    x = MaxPooling2D(pool_size=(2, 2), strides=1, name='max_pooling_3')(x)
    x=BatchNormalization()(x)
    x=SpatialPyramidPooling([1,2,4])(x) #这里就是上述对应的1x1,2x2,4x4的 块
    x=Dense(5,activation='softmax')(x)
    #这里就是最重要得SppLayer了
    if input_tensor is not None:
        inputs = layer_utils.get_source_inputs(input_tensor)
    else:
         inputs = img_input
    model = training.Model(inputs, x, name='SppNet')
    return model
model =Spp_test_model(classes=5)
model.summary()

#划分为训练集和验证集
from tensorflow import keras
from sklearn.model_selection import train_test_split
X_train, X_val, y_train, y_val = train_test_split(image_data, image_labels, train_size=0.7,random_state=42,
                                                  shuffle=True)
del image_data
del image_labels
X_train = X_train/ 255.0 #归一化
X_val = X_val / 255.0 #归一化
y_train = keras.utils.to_categorical(y_train,5)
y_val = keras.utils.to_categorical(y_val,5)
print("X_train.shape", X_train.shape)
print("X_valid.shape", X_val.shape)
print("y_train.shape", y_train.shape)
print("y_valid.shape", y_val.shape)

#训练过程
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.optimizers import Adam,SGD
lr=0.0001
epochs=15
opt=Adam(lr=lr,decay=lr/(epochs/0.5))
model.compile(loss='categorical_crossentropy',optimizer=opt,metrics=['acc'])
aug = ImageDataGenerator(
    rotation_range=10,
    zoom_range=0.15,
    width_shift_range=0.1,
    height_shift_range=0.1,
    shear_range=0.15,
    horizontal_flip=False,
    vertical_flip=False,
    fill_mode='nearest'
)
history = model.fit(X_train, y_train, batch_size=50,
                    epochs=epochs, validation_data=(X_val, y_val))


网络结构:

Model: "SppNet"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, None, None, 3)]   0         
_________________________________________________________________
conv_block_1 (Conv2D)        (None, None, None, 32)    896       
_________________________________________________________________
max_pooling_1 (MaxPooling2D) (None, None, None, 32)    0         
_________________________________________________________________
batch_normalization (BatchNo (None, None, None, 32)    128       
_________________________________________________________________
conv_block_2 (Conv2D)        (None, None, None, 64)    18496     
_________________________________________________________________
max_pooling_2 (MaxPooling2D) (None, None, None, 64)    0         
_________________________________________________________________
batch_normalization_1 (Batch (None, None, None, 64)    256       
_________________________________________________________________
conv_block_3 (Conv2D)        (None, None, None, 32)    18464     
_________________________________________________________________
max_pooling_3 (MaxPooling2D) (None, None, None, 32)    0         
_________________________________________________________________
batch_normalization_2 (Batch (None, None, None, 32)    128       
_________________________________________________________________
spatial_pyramid_pooling (Spa (None, 672)               0         
_________________________________________________________________
dense (Dense)                (None, 5)                 3365      
=================================================================
Total params: 41,733
Trainable params: 41,477
Non-trainable params: 256
_________________________________________________________________

训练过程(只贴了一部分,没有全部贴):

Epoch 11/15
61/61 [==============================] - 75s 1s/step - loss: 0.6994 - acc: 0.7422 - val_loss: 0.9164 - val_acc: 0.6530
Epoch 12/15
61/61 [==============================] - 75s 1s/step - loss: 0.6872 - acc: 0.7438 - val_loss: 0.9210 - val_acc: 0.6515
Epoch 13/15
61/61 [==============================] - 75s 1s/step - loss: 0.6831 - acc: 0.7504 - val_loss: 0.8881 - val_acc: 0.6623
Epoch 14/15
61/61 [==============================] - 75s 1s/step - loss: 0.6232 - acc: 0.7710 - val_loss: 0.8825 - val_acc: 0.6677
Epoch 15/15
61/61 [==============================] - 75s 1s/step - loss: 0.6121 - acc: 0.7739 - val_loss: 0.8817 - val_acc: 0.6669

到这里所有工作就结束了~

总结

其实呢,我们再训练的时候还是会去指定图片的大小,只是使用Spp和GlobalAveragePooling2D的时候可以忽略一下图片大小和全连接层的关系。直接Flatten的参数太多,如果参数在还可以接受的范围,那就放心用吧。一开始我准备的是直接读入图片,不限定大小,放到网络中去训练,但是,在读取过程中,需要把图片放在一个list里面,然后再把list转换成numpy array的形似去计算,由于图片大小不一,导致列表中每个元素的大小不一样,转换不了。这里呢也欢迎知道小伙伴给我指点迷津。最后 祝大家五一快乐~

参考

Flatten 和GlobalAveragePooling2D小节中的图片来源于Google,关于SpatialPyramidPooling小节中的图片来源于原论文

  • 连接:
[1]https://zhuanlan.zhihu.com/p/79888509
[2]https://www.cnblogs.com/zongfa/p/9076311.html
[3]https://phimos.github.io/2020/07/21/RN-SPPLayer/
[4]https://github.com/yhenon/keras-[5]spp/blob/master/spp/SpatialPyramidPooling.py
[6]https://mermaid-js.github.io/mermaid/#/flowchart?id=interaction
[7]https://blog.csdn.net/u011616825/article/details/112302220
[8]https://zh-v2.d2l.ai/chapter_convolutional-neural-networks/lenet.html
  • 参考书籍:
书名:深度学习:卷积神经网络从入门到精通
书名:<<动手深度学习>>

你可能感兴趣的:(深度学习入门,tensorflow,深度学习,神经网络)