@: train_faster_rcnn_alt_opt.py
本文主要功能是实现Alternating optimization的训练方法,在NIPS的那篇paper上仅有较为简略的介绍,所以我认为通过源码来学习是很有必要的。
其他主要模块的解读也会在后续陆续放出。本人水平有限,如果有一些理解有偏差;或者说有些重要的点被一笔带过,读者想要深入了解的,可以在评论区留言,共同交流,共同进步!
该文件包含以下函数,并作简要介绍:
- parse_args()
传递参数
- get_roidb(imdb_name, rpn_file= None)
获取roidb与lmdb
- get_solvers(net_name)
获取solvers
- _init_caffe(cfg)
根据config加载caffe对象
- train_rpn(queue, imdb_name, init_model, solver=None,
max_iters, cfg)
训练RPN网络
- rpn_generate(queue, imdb_name, rpn_model_path, cfg,
rpn_test_prototxt)
用RPN网络生产proposal
- train_fast_rcnn(queue, imdb_name, init_model, solver,
max_iters, cfg, rpn_file)
用RPN网络产生的proposal来训练fast_rcnn网络
- 最后是主函数,详细的描述了四个训练步骤
imdb根据imdb_name(默认是“voc_2007_trainval)来获取,这里的imdb对象的获取采用了工厂模式,由\lib\datasets\factory.py生成,根据年份(2007)与切分的数据集(trainval)返回pascal_voc对象,pascal_voc与coco都继承于imdb对象。(\lib\datasets\pascal_voc.py+coco.py)
roidb是通过lib\faster_rcnn\train.py中的get_training_roidb来获取的,这个roidb是一个imdb的成员变量,包含了训练集图片中框出的每个区域。这个函数做了两件事情,一是将原有的roidb中的每张图片进行水平翻转然后添加回roidb中,第二件事是做一些准备工作(有一些让我很无语……),详细的将在相应的文件进行介绍
def get_roidb(imdb_name, rpn_file=None):
imdb = get_imdb(imdb_name)
print 'Loaded dataset {:s} for training'.format(imdb.name)
imdb.set_proposal_method(cfg.TRAIN.PROPOSAL_METHOD)
print 'Set proposal method: {:s}'.format(cfg.TRAIN.PROPOSAL_METHOD)
if rpn_file is not None:
imdb.config['rpn_file'] = rpn_file
roidb = get_training_roidb(imdb)
return roidb, imdb
在models/pascal_voc/netname/faster_rcnn_alt_opt文件夹下有stage1_rpn_solver60k80k.pt等不同阶段所对应的solver文件,并定义了各个阶段最大迭代次数,这里支持的net_name有VGG16、VGG_CNN_M_1024、ZF三种
def get\_solvers(net\_name):
# Faster R-CNN Alternating Optimization
n = 'faster_rcnn_alt_opt'
# Solver for each training stage
solvers = [[net_name, n, 'stage1_rpn_solver60k80k.pt'],
[net_name, n, 'stage1_fast_rcnn_solver30k40k.pt'],
[net_name, n, 'stage2_rpn_solver60k80k.pt'],
[net_name, n, 'stage2_fast_rcnn_solver30k40k.pt']]
solvers = [os.path.join(cfg.MODELS_DIR, *s) for s in solvers]
# Iterations for each training stage
max_iters = [80000, 40000, 80000, 40000]
# max_iters = [100, 100, 100, 100]
# Test prototxt for the RPN
rpn_test_prototxt = os.path.join(
cfg.MODELS_DIR, net_name, n, 'rpn_test.pt')
return solvers, max_iters, rpn_test_prototxt
该函数作用便是初始化caffe对象,仅做了两步操作,第一步是初始化随机种子,第二步是设置GPU。
_init_caffe(cfg):
roidb, imdb = get_roidb(imdb_name)
print 'roidb len: {}'.format(len(roidb))
output_dir = get_output_dir(imdb)
print 'Output will be saved to {:s}'.format(output_dir)
model_paths = train_net(solver, roidb, output_dir,
pretrained_model=init_model,
max_iters=max_iters)
# Cleanup all but the final model
for i in model_paths[:-1]:
os.remove(i)
rpn_model_path = model_paths[-1]
# Send final model path through the multiprocessing queue
queue.put({'model_path': rpn_model_path})
这个函数,顾名思义,就是训练rpn网络,个人认为rpn网络的工作机制是faster-rcnn中最难理解的部分,也是其核心魅力所在。
首先这里初始化了一些配置,比较容易忽略的的是PROPOSAL_METHOD设置成了’gt’,值得一提。这个设定是从get_roidb然后追溯到底层数据类pascal-voc得到体现,可以看到imdb(pascal_voc的父类)通过roidb_handler来决定用什么方式生成roidb,默认为selective_search,这里用了gt_roidb。
随后,该函数做了一些训练的准备工作,初始化模型,确定路径等
然后再开始正式训练,训练的细节,在别的章节会有更详细的描述。
def train_rpn(queue=None, imdb\_name=None, init\_model=None, solver=None,
max_iters=None, cfg=None):
"""Train a Region Proposal Network in a separate training process.
"""
# Not using any proposals, just ground-truth boxes
cfg.TRAIN.HAS_RPN = True
cfg.TRAIN.BBOX_REG = False # applies only to Fast R-CNN box regression
# 用ground_truth作为proposal
cfg.TRAIN.PROPOSAL_METHOD = 'gt'
# 每个minibatch训练用到的Image数,默认为2,在这里改为1
cfg.TRAIN.IMS_PER_BATCH = 1
print 'Init model: {}'.format(init_model)
print('Using config:')
pprint.pprint(cfg)
#初始化caffe
import caffe
_init_caffe(cfg)
roidb, imdb = get_roidb(imdb_name)
print 'roidb len: {}'.format(len(roidb))
output_dir = get_output_dir(imdb)
print 'Output will be saved to {:s}'.format(output_dir)
model_paths = train_net(solver, roidb, output_dir,
pretrained_model=init_model,
max_iters=max_iters)
# Cleanup all but the final model
for i in model_paths[:-1]:
os.remove(i)
rpn_model_path = model_paths[-1]
# Send final model path through the multiprocessing queue
queue.put({'model_path': rpn_model_path})
该函数的作用就是根据输入的数据与模型与prototxt产生proposal,可作为下一步的训练所用,也可作为测试。
该函数最最核心的一句代码是rpn_proposals = imdb_proposals(rpn_net, imdb),其他的都是作为参数准备,与输出的一些工作。如果仅仅浅尝辄止的读者,知道这个函数的功能就是对每张图片产生最多2000个roi proposal 与对应的scores然后缓存到某个文件夹即可。
希望打破砂锅问到底的读者,可以追到函数里以及prototxt文件去读其实现细节。其中proposal_layer我也会单独开一个篇章来讨论实现中的一些重要工作。
def rpn_generate(queue=None, imdb\_name=None, rpn\_model\_path=None, cfg=None,rpn_test_prototxt=None):
"""Use a trained RPN to generate proposals.
"""
cfg.TEST.RPN_PRE_NMS_TOP_N = -1 # no pre NMS filtering
cfg.TEST.RPN_POST_NMS_TOP_N = 2000 # limit top boxes after NMS
print 'RPN model: {}'.format(rpn_model_path)
print('Using config:')
pprint.pprint(cfg)
import caffe
_init_caffe(cfg)
# NOTE: the matlab implementation computes proposals on flipped images, too.
# We compute them on the image once and then flip the already computed
# proposals. This might cause a minor loss in mAP (less proposal jittering).
imdb = get_imdb(imdb_name)
print 'Loaded dataset {:s} for proposal generation'.format(imdb.name)
# Load RPN and configure output directory
rpn_net = caffe.Net(rpn_test_prototxt, rpn_model_path, caffe.TEST)
output_dir = get_output_dir(imdb)
print 'Output will be saved to {:s}'.format(output_dir)
# Generate proposals on the imdb
rpn_proposals = imdb_proposals(rpn_net, imdb)
# Write proposals to disk and send the proposal file path through the
# multiprocessing queue
rpn_net_name = os.path.splitext(os.path.basename(rpn_model_path))[0]
rpn_proposals_path = os.path.join(
output_dir, rpn_net_name + '_proposals.pkl')
with open(rpn_proposals_path, 'wb') as f:
cPickle.dump(rpn_proposals, f, cPickle.HIGHEST_PROTOCOL)
print 'Wrote RPN proposals to {}'.format(rpn_proposals_path)
queue.put({'proposal_path': rpn_proposals_path})
这个函数就是训练fast-rcnn的部分,首先它将产生roidb的方法设置成rpn_roidb,工厂模式的获取roidb思想,在上文已提。
接下来就是准备一些参数、路径等等,用于送入网络训练,最后保存模型。具体的训练细节,需要阅读prototxt文件才能把它的过程弄得水落石出。
def train_fast_rcnn(queue=None, imdb_name=None, init_model=None, solver=None,
max_iters=None, cfg=None, rpn_file=None):
"""Train a Fast R-CNN using proposals generated by an RPN.
"""
#这个参数的设置是为了提高代码的重用性。可以看到其他文件中,train_rpn和train_fast_rcnn的过程在实现时,会有重复代码,故设置该变量将其合并。
cfg.TRAIN.HAS_RPN = False # not generating prosals on-the-fly
cfg.TRAIN.PROPOSAL_METHOD = 'rpn' # use pre-computed RPN proposals instead
#每个mini-batch包含两张图片,以及他们proposal的roi区域
cfg.TRAIN.IMS_PER_BATCH = 2
print 'Init model: {}'.format(init_model)
print 'RPN proposals: {}'.format(rpn_file)
print('Using config:')
pprint.pprint(cfg)
import caffe
_init_caffe(cfg)
roidb, imdb = get_roidb(imdb_name, rpn_file=rpn_file)
output_dir = get_output_dir(imdb)
print 'Output will be saved to {:s}'.format(output_dir)
# Train Fast R-CNN
model_paths = train_net(solver, roidb, output_dir,
pretrained_model=init_model,
max_iters=max_iters)
# Cleanup all but the final model
for i in model_paths[:-1]:
os.remove(i)
fast_rcnn_model_path = model_paths[-1]
# Send Fast R-CNN model path over the multiprocessing queue
queue.put({'model_path': fast_rcnn_model_path})
if __name__ == '__main__':
args = parse_args()
print('Called with args:')
print(args)
# 这里有必要介绍一下lib/faster_rcnn/config.py,这个文件里主要包含一些全局配置信息,在文件头部只要从中import cfg就能够获取到相关的配置信息了,其中cfg_from_file和cfg_from_list都是该文件提供的函数
if args.cfg_file is not None:
cfg_from_file(args.cfg_file)
if args.set_cfgs is not None:
cfg_from_list(args.set_cfgs)
cfg.GPU_ID = args.gpu_id
# --------------------------------------------------------------------------
# Pycaffe doesn't reliably free GPU memory when instantiated nets are
# discarded (e.g. "del net" in Python code). To work around this issue, each
# training stage is executed in a separate process using
# multiprocessing.Process.
# --------------------------------------------------------------------------
# 这里不同的训练stage将通过多线程来提高效率,mp_queue是进程间用于通讯的数据结构
# queue for communicated results between processes
mp_queue = mp.Queue()
# 获取 solves, iters, etc. for each training stage
solvers, max_iters, rpn_test_prototxt = get_solvers(args.net_name)
print
print 'Stage 1 RPN, init from ImageNet model'
print
# 第一个stage开始啦~~
cfg.TRAIN.SNAPSHOT_INFIX = 'stage1'
mp_kwargs = dict(
queue=mp_queue,
imdb_name=args.imdb_name,
init_model=args.pretrained_model,
solver=solvers[0],
max_iters=max_iters[0],
cfg=cfg)
# 可以看到第一个步骤是用ImageNet的模型M0来Finetuning RPN网络得到模型M1
p = mp.Process(target=train_rpn, kwargs=mp_kwargs)
p.start()
rpn_stage1_out = mp_queue.get()
p.join()
print
print 'Stage 1 RPN, generate proposals'
print
# 接下来就是调用第一步训练得到的模型M1来产生proposal P1
mp_kwargs = dict(
queue=mp_queue,
imdb_name=args.imdb_name,
rpn_model_path=str(rpn_stage1_out['model_path']),
cfg=cfg,
rpn_test_prototxt=rpn_test_prototxt)
p = mp.Process(target=rpn_generate, kwargs=mp_kwargs)
p.start()
rpn_stage1_out['proposal_path'] = mp_queue.get()['proposal_path']
p.join()
print
print 'Stage 1 Fast R-CNN using RPN proposals, init from ImageNet model'
print
#用上一步产生的proposal,以及ImageNet模型M0训练fast-rcnn产生模型M2
cfg.TRAIN.SNAPSHOT_INFIX = 'stage1'
mp_kwargs = dict(
queue=mp_queue,
imdb_name=args.imdb_name,
init_model=args.pretrained_model,
solver=solvers[1],
max_iters=max_iters[1],
cfg=cfg,
rpn_file=rpn_stage1_out['proposal_path'])
p = mp.Process(target=train_fast_rcnn, kwargs=mp_kwargs)
p.start()
fast_rcnn_stage1_out = mp_queue.get()
p.join()
print
print 'Stage 2 RPN, init from stage 1 Fast R-CNN model'
print
#用模型M2训练RPN网络,这一次与stage1的RPN网络训练不一样的一点在于,这一次conv层的参数是被冻结的,只做前向运算。训练得到模型M3
cfg.TRAIN.SNAPSHOT_INFIX = 'stage2'
mp_kwargs = dict(
queue=mp_queue,
imdb_name=args.imdb_name,
init_model=str(fast_rcnn_stage1_out['model_path']),
solver=solvers[2],
max_iters=max_iters[2],
cfg=cfg)
p = mp.Process(target=train_rpn, kwargs=mp_kwargs)
p.start()
rpn_stage2_out = mp_queue.get()
p.join()
print
print 'Stage 2 RPN, generate proposals'
print
#用M3模型产生proposal P2
mp_kwargs = dict(
queue=mp_queue,
imdb_name=args.imdb_name,
rpn_model_path=str(rpn_stage2_out['model_path']),
cfg=cfg,
rpn_test_prototxt=rpn_test_prototxt)
p = mp.Process(target=rpn_generate, kwargs=mp_kwargs)
p.start()
rpn_stage2_out['proposal_path'] = mp_queue.get()['proposal_path']
p.join()
print
print 'Stage 2 Fast R-CNN, init from stage 2 RPN R-CNN model'
print
#基于M3模型与P2训练fast-rcnn得到最终模型M4
cfg.TRAIN.SNAPSHOT_INFIX = 'stage2'
mp_kwargs = dict(
queue=mp_queue,
imdb_name=args.imdb_name,
init_model=str(rpn_stage2_out['model_path']),
solver=solvers[3],
max_iters=max_iters[3],
cfg=cfg,
rpn_file=rpn_stage2_out['proposal_path'])
p = mp.Process(target=train_fast_rcnn, kwargs=mp_kwargs)
p.start()
fast_rcnn_stage2_out = mp_queue.get()
p.join()
#输出最终模型
# Create final model (just a copy of the last stage)
final_path = os.path.join(
os.path.dirname(fast_rcnn_stage2_out['model_path']),
args.net_name + '_faster_rcnn_final.caffemodel')
print 'cp {} -> {}'.format(
fast_rcnn_stage2_out['model_path'], final_path)
shutil.copy(fast_rcnn_stage2_out['model_path'], final_path)
print 'Final model: {}'.format(final_path)