初窥Tensorflow Object Detection API 源码之(1.2)FeatureExtractor.Config

  • type
  • batch_norm_trainable
  • first_stage_features_stride

message FasterRcnnFeatureExtractor {
  // Type of Faster R-CNN model (e.g., 'faster_rcnn_resnet101';
  // See builders/model_builder.py for expected types).
  optional string type = 1;

  // Output stride of extracted RPN feature map.
  optional int32 first_stage_features_stride = 2 [default=16];

  // Whether to update batch norm parameters during training or not.
  // When training with a relative large batch size (e.g. 8), it could be
  // desirable to enable batch norm update.
  optional bool batch_norm_trainable = 3 [default=false];
}

type

string类型
值:faster_rcnn_resnet101 等

batch_norm_trainable

默认false

first_stage_features_stride

FasterRCNNResnetV1FeatureExtractor的__init__获得了该配置,然后传给了父类FasterRCNNFeatureExtractor的__init__

if first_stage_features_stride != 8 and first_stage_features_stride != 16:
      raise ValueError('`first_stage_features_stride` must be 8 or 16.')

注意:该配置的值只能是8或16,否则报错

在FasterRCNNFeatureExtractor的__init__中,又观察到

self._first_stage_features_stride = first_stage_features_stride

该变量被赋值给了私有变量,然后我在该类和它的派生类中搜索_first_stage_features_stride,发现了在_extract_proposal_features这样一段

with tf.variable_scope(
            self._architecture, reuse=self._reuse_weights) as var_scope:
          _, activations = self._resnet_model(
              preprocessed_inputs,
              num_classes=None,
              is_training=self._train_batch_norm,
              global_pool=False,
              output_stride=self._first_stage_features_stride,
              spatial_squeeze=False,
              scope=var_scope)

值又被传给了_resnet_modeloutput_stride,好复杂有木有?面向对象就要这么绕,习惯就好。这里的_resnet_model,猜也能猜到,是resnet的函数

resnet_v1.py中找到了output_stride的解释

output_stride: If None, then the output will be computed at the nominal
network stride. If output_stride is not None, it specifies the requested
ratio of input to output spatial resolution.

意思是定义了从输入到输出的分辨率比
但此时一想,不对啊,featureExtractor只使用了resnet的前3个block的输出,然后替换上了自己的block4,为什么输出分辨率比没有变化
这里有两个注意点:
1. resnet50,101,152,200等网络的block4的stride都为1,且FeatureExtractor的block4的stride亦为1,所以替换不影响stride;
2. resnet默认对输入阶段就有conv2d、max_pool2d,使得output_stride/=4

if include_root_block:
          if output_stride is not None:
            if output_stride % 4 != 0:
              raise ValueError('The output_stride needs to be a multiple of 4.')
            output_stride /= 4
          net = resnet_utils.conv2d_same(net, 64, 7, stride=2, scope='conv1')
          net = slim.max_pool2d(net, [3, 3], stride=2, scope='pool1')

include_root_block默认为true

你可能感兴趣的:(Tensorflow,OD,API)