【论文简述】Adaptive region aggregation for MVS matching using deformable convolutional network(2023)


1. 第一作者:Han Hu

2. 发表年份:2023

3. 发表期刊:The Photogrammetric Record

4. 关键词:MVS,三维重建,深度学习,自适应卷积

5. 探索动机:尽管使用自适应匹配窗口获得了令人鼓舞的结果,几何先验的确定仍然是一个挑战。

Ideally, the estimation of the photo-consistency must be confined within a certain region, for example, within the boundaries of the target object. Low-level geometrical features, such as line contours, planar segments or superpixels, are typically used to adaptively select the supporting domain. However, low-level features are vulnerable to noise and may not correspond to the object boundaries. For instance, even correctly defined line contours may not essentially represent the contours of discontinuous regions. Therefore, the determination of meaningful geometrical priors requires high-level semantic understanding of the object rather than low-level geometric clues.

6. 工作目标:提升匹配特征的有效性。

7. 核心思想:本文提出了一种使用可变形卷积网络(DCNs)的MVS自适应区域聚合方法。

  1. a learnable adaptive region aggregation method for MVSNet based on DCNs for effectively matching descriptors;
  2. a dedicated offset regulariser for the learnable offsets of the DCN to enhance its convergence.

8. 实验结果:

The proposed method outperforms the state-of-the-art method in dynamic areas with a significant error reduction of 21.3% while retaining its superiority in overall performance on KITTI. It also achieves the best generalization ability on the DDAD dataset in dynamic areas than the competing methods





1. 具有自适应聚合窗口的可变形特征提取器

CNN令人印象深刻的特征学习能力允许创建高级特征。如前所述,确定合适的支持域来计算光度一致性需要对场景的语义理解,例如,窗口的自适应聚合必须利用通过CNN层获得的潜在特征。DCNs在卷积中引入额外的像素偏移,并在移位的位置选择特征,从而产生不规则的感受野。使用DCN对图像匹配的自适应聚合窗口进行建模。下图演示了 3 × 3可变形卷积,如何从N维特征学习偏移并进行卷积。

【论文简述】Adaptive region aggregation for MVS matching using deformable convolutional network(2023)_第1张图片


【论文简述】Adaptive region aggregation for MVS matching using deformable convolutional network(2023)_第2张图片

【论文简述】Adaptive region aggregation for MVS matching using deformable convolutional network(2023)_第3张图片

2. 损失函数



其中,每个核点p的偏移距离记为ok(p) =√x2 +y2,且偏移量小于3像素被截断为0。上述两种损失通过经验权值平衡,并联合用于反向传播L=LD + 10LO。
