【论文简述】High-frequency Stereo Matching Network(CVPR 2023)


1. 第一作者:Haoliang Zhao

2. 发表年份:2023

3. 发表期刊:CVPR

4. 关键词:立体匹配、MVS、深度学习、高频信息、LSTM

5. 探索动机:(1)当涉及到估计的视差图的更精细的特征时,大多数当前的方法都是不足的。特别是对于物体的边缘性能。在散景和渲染应用程序中,视差图的边缘性能对最终结果至关重要。(2)无纹理区域的失配和薄物体的缺失也是导致视差图显著恶化的重要因素。例如,弱纹理墙的不匹配和细电线的缺失是避障应用的致命缺陷。

(1)Most current approaches fall short when it comes to the finer features of the estimated disparity map. Especially for the edge performance of the objects. In bokeh and rendering applications, the edge performance of the disparity map is critical to the final result. For example, technologies that require pixellevel rendering, such as VR and AR, have high requirements for fitting between the scene model and the image mapping, which means we need a tight fit between the edges in the disparity map and the original RGB image.

(2)The mismatch of textureless regions and the missing of thin objects are also important factors that significantly deteriorate the disparity map. For example, the mismatch of weak texture walls and the missing of thin electrical wires are fatal flaws for obstacle avoidance applications.

6. 工作目标:目标是解决边缘模糊、薄物体缺失和无纹理区域不匹配的问题。

【论文简述】High-frequency Stereo Matching Network(CVPR 2023)_第1张图片

7. 核心思想:提出了一种新的端到端数据驱动的立体匹配方法DLNR (Stereo Matching Network with decoupling LSTM and Normalization Refinement)。

  1. Most of the current iterative methods usually apply the original GRU structure as their iterative cell. While the problem is that in the original GRU structure, the information used to generate the update matrix of the disparity map is coupled with the value of the hidden state transfer between iterations, making it hard to keep subtle details in the hidden state. Therefore, we designed the Decouple LSTM module to decouple the hidden state from the update matrix of the disparity map.Decouple LSTM keeps more high-frequency information in the iterative stage through data decoupling, however, in order to balance performance and computational speed, the resolution of the iterative stage is only 1/4 of the original resolution at most.
  2. However, due to the large differences in disparity ranges between different images and different datasets, the Refinement module often has poor generalization performance when encountering images with different disparity ranges. In particular, when performing finetune, the module may even fail when encountering disparity ranges that differ greatly. To address this problem, we propose the Disparity Normalization strategy. Experiments and visualizations proved that the module improves performance as well as alleviates the problem of domain difference.
  3. most learning-based methods still use ResNet-like feature extractors which fall short when providing information for well-designed poststage structures. To alleviate the problem, we propose the Channel-Attention Transformer feature extractor aims to capture long-range pixel dependencies and preserve highfrequency information.

8. 实验结果:

Our method (DLNR) ranks 1st on the Middlebury leaderboard, significantly outperforming the next best method by 13.04%. Our method also achieves SOTA performance on the KITTI-2015 benchmark for D1-fg.





1. 网络结构


【论文简述】High-frequency Stereo Matching Network(CVPR 2023)_第2张图片

2. Channel-Attention Transformer extractor


【论文简述】High-frequency Stereo Matching Network(CVPR 2023)_第3张图片

2.1. 保留高频信息

为了达到锐利边缘和更好地处理弱纹理区域的目的,在处理过程中保持高频是至关重要的。最直观的方法是在整个结构中保持高分辨率,但这会导致极高的计算成本。而采用带有步长的卷积或池化机制进行下采样将不可避免地导致信息丢失和性能下降。为了缓解这个问题,用Pixel Unshuffle将图像降采样到原始大小的1/4,并在不丢失任何高频信息的情况下扩展通道。具体地说,原图像的形状为[C, H∗r, W∗r],经过Pixel Unshuffle后被重塑为[C∗r2, H, W]。

2.2. 通道注意力机制

传统的自注意力管理着一个注意力图HW ×HW,这导致二次复杂度,使得它不适合需要高分辨率的视觉任务。因此,采用的CWSA模块来源于MDTA[42]模块首先由Restromer[42]提出,它以线性复杂度计算通道维度上的自注意力。

3. Multiscale Decouple LSTM Regularization


3.1. 多尺度设计


3.2. 解耦机制

在大多数迭代视觉网络使用的原始GRU结构中,隐藏状态h用于生成视差的更新矩阵(GRU Cell的输出),同时h也是GRU网络的隐藏状态(向下一次迭代传递信息)。在消融实验中,这种耦合问题被证明对网络性能有重大影响。


【论文简述】High-frequency Stereo Matching Network(CVPR 2023)_第4张图片


​4. Disparity Normalization Refinement



【论文简述】High-frequency Stereo Matching Network(CVPR 2023)_第5张图片


【论文简述】High-frequency Stereo Matching Network(CVPR 2023)_第6张图片


【论文简述】High-frequency Stereo Matching Network(CVPR 2023)_第7张图片

然后在归一化视差图中的信息Dfr,误差图El与左侧图像Il将进行组合并通过沙漏网络处理,得到归一化精细视差图Dfr '。

【论文简述】High-frequency Stereo Matching Network(CVPR 2023)_第8张图片


5. Loss Function


【论文简述】High-frequency Stereo Matching Network(CVPR 2023)_第9张图片

7. 实验

7.1. 与先进技术的比较

【论文简述】High-frequency Stereo Matching Network(CVPR 2023)_第10张图片

【论文简述】High-frequency Stereo Matching Network(CVPR 2023)_第11张图片

【论文简述】High-frequency Stereo Matching Network(CVPR 2023)_第12张图片

【论文简述】High-frequency Stereo Matching Network(CVPR 2023)_第13张图片

7.2. 消融实验

【论文简述】High-frequency Stereo Matching Network(CVPR 2023)_第14张图片

