Pyramid Scene Parsing Network
Keras实现代码链接:https://github.com/BBuf/Keras-Semantic-Segmentation
大多数语义分割模型的工作基于两个方面:
为了进一步减少不同子区域之间的上下文信息丢失,本文提出了一个分层的全局先验,其中包含具有不同规模和不同子区域之间变化的信息,称其为金字塔池模块,用于在深度神经网络的最终层特征图上进行全局场景提前构造,如图3(c)所示。
该模块融合了4种不同金字塔尺度的特征,第一行红色是最粗糙的特征–全局池化生成单个bin输出,后面三行是不同尺度的池化特征。为了保证全局特征的权重,如果金字塔共有N个级别,则在每个级别后使用1×1的卷积将对于级别通道降为原本的1/N。再通过双线性插值进行up-sample获得未池化前的大小,最终将它们concat到一起。
金字塔等级的池化核大小是可以设定的,这与送到金字塔的输入有关。论文中使用的4个等级,核大小分别为1×1,2×2,3×3,6×6。
图4展示了本文受深度监督的ResNet101模型的示例。除了使用softmax损失来训练最终分类器的主分支外,在第四阶段之后使用了另一个分类器,即res4b22残基块。 与将反向辅助损耗阻止到几个浅层的中继反向传播不同,本文让两个损耗函数通过所有先前的层。 辅助损失有助于优化学习过程,而主分支损失承担最大责任,增加权重以平衡辅助损失。
Keras实现的模型代码:
#coding=utf-8
from keras.models import *
from keras.layers import *
import keras.backend as K
import tensorflow as tf
#池化块(POOL)
def pool_block(inp, pool_factor):
h = K.int_shape(inp)[1]
w = K.int_shape(inp)[2]
pool_size = strides = [int(np.round( float(h) / pool_factor)), int(np.round( float(w)/ pool_factor))]
x = AveragePooling2D(pool_size, strides=strides, padding='same')(inp)
x = Conv2D(256, (1, 1), padding='same', activation='relu')(x)
x = BatchNormalization()(x)
x = Lambda(lambda x: tf.image.resize_bilinear(x, size=(int(x.shape[1])*strides[0], int(x.shape[2])*strides[1])))(x) # 上采样up-sample
x = Conv2D(256, (1, 1), padding='same', activation='relu')(x)
return x
def PSPNet(nClasses, input_width=384, input_height=384):
assert input_height % 192 == 0
assert input_width % 192 == 0
inputs = Input(shape=(input_height, input_width, 3))
x = (Conv2D(64, (3, 3), activation='relu', padding='same'))(inputs)
x = (BatchNormalization())(x)
x = (MaxPooling2D((2, 2)))(x)
f1 = x
# 192 x 192
x = (Conv2D(128, (3, 3), activation='relu', padding='same'))(x)
x = (BatchNormalization())(x)
x = (MaxPooling2D((2, 2)))(x)
f2 = x
# 96 x 96
x = (Conv2D(256, (3, 3), activation='relu', padding='same'))(x)
x = (BatchNormalization())(x)
x = (MaxPooling2D((2, 2)))(x)
f3 = x
# 48 x 48
x = (Conv2D(256, (3, 3), activation='relu', padding='same'))(x)
x = (BatchNormalization())(x)
x = (MaxPooling2D((2, 2)))(x)
f4 = x
# 24 x 24
o = f4
pool_factors = [1, 2, 3, 6] # 不同尺度的池化因子(4个)
pool_outs = [o]
for p in pool_factors:
pooled = pool_block(o, p)
pool_outs.append(pooled)
o = Concatenate(axis=3)(pool_outs) # 将上采样后的特征图拼接在一起
o = Conv2D(256, (3, 3), activation='relu', padding='same')(o)
o = BatchNormalization()(o)
o = Lambda(lambda x: tf.image.resize_bilinear(x, size=(int(x.shape[1])*8, int(x.shape[2])*8)))(x)
o = Conv2D(nClasses, (1, 1), padding='same')(o)
o_shape = Model(inputs, o).output_shape
outputHeight = o_shape[1]
outputWidth = o_shape[2]
print(outputHeight)
print(outputWidth)
o = (Reshape((outputHeight*outputWidth, nClasses)))(o)
o = (Activation('softmax'))(o)
model = Model(inputs, o)
model.outputWidth = outputWidth
model.outputHeight = outputHeight
return model