本文为作者读研期间基于交互式图像分割领域公开文献的系统梳理与个人理解总结,所有内容均为原创撰写(ai辅助创作),未直接复制或抄袭他人成果。文中涉及的算法、模型及实验结论均参考自领域内公开发表的学术论文(具体文献见文末参考文献列表)。本文旨在为交互式图像分割领域的学习者提供一份结构化的综述参考,内容涵盖技术演进、核心方法、关键技术优化及应用前景,希望能为相关研究提供启发。
摘要:本文系统综述了基于深度学习的交互式图像分割技术(2016-2023)。文章首先分析传统自动分割方法的局限性,指出通过用户交互(点击/画线/画框)可显著提升分割精度。核心内容围绕三大交互方式展开:点击交互通过极值点与距离图编码空间关系;画线交互利用测地线距离与多任务学习优化边界;画框交互结合注意力机制增强小目标分割。重点讨论了网络架构创新(如Transformer)、关键技术(多尺度融合、损失函数优化)以及医疗、自动驾驶等应用场景。最后指出当前挑战(跨域泛化、实时性)与未来方向(神经符号融合、量子加速)。旨在为研究者提供结构化参考。
图像分割作为计算机视觉领域的核心技术,旨在从复杂场景中分离目标与背景,提取感兴趣区域,在目标识别、自动驾驶、医疗诊断等领域具有不可替代的作用[1-3]。传统自动分割方法依赖像素级特征匹配,普适性差且难以应对复杂场景(如低对比度、多目标重叠);而交互式图像分割通过用户输入(如点击、画线、画框)提供先验约束,显著提升了分割精度与灵活性[4]。近年来,深度学习(DL)技术的突破为交互式分割注入新动能,基于DL的方法通过端到端学习用户交互信息与图像特征的关联,实现了“少量交互-高精度分割”的跨越[5]。本文系统梳理2016-2023年间核心文献,围绕点击交互、画线交互、画框交互三大主流方式,结合网络架构、损失函数优化等关键技术展开综述,全面展现该领域的技术演进与创新(部分方法提供开源代码链接)。
点击交互通过用户标注少量关键点(如目标中心、边界极值点)提供约束,具有交互效率高、信息密度大的特点。仅需用户标注少量像素,即可通过模型学习点的分布规律,实现高质量分割。
画线交互通过用户绘制线条(如涂鸦、边界轮廓)提供更丰富的像素级约束,适用于需要精细分割的场景。线条覆盖的像素数量远多于点击点,可传递更密集的位置与方向信息,显著提升分割边界的准确性。
画框交互通过用户绘制矩形框标注目标大致范围,适用于弱监督场景,尤其对小目标分割效果显著。矩形框虽仅提供粗略位置信息,但通过模型学习框内像素的统计特性,可解决小目标因像素不足导致的识别困难。
交互式分割的发展离不开高质量数据集的支持。常用数据集包括:
尽管基于DL的交互式分割已取得显著进展,但仍面临以下挑战:
未来研究方向包括:
基于深度学习的交互式分割在多领域展现出巨大潜力:
基于深度学习的交互式图像分割技术通过用户交互与模型学习的深度融合,突破了传统方法的局限性,在精度、效率与灵活性上均取得显著进步。未来需重点解决跨域泛化、小目标分割与实时性等问题,结合神经符号、量子计算等技术,推动其在医疗、自动驾驶等关键领域的规模化应用。
年份 | 存在的问题 | 方法/模型 | 关键词 | 创新点(技术突破) | 网络架构 |
---|---|---|---|---|---|
2016 | 交互次数仍较多(需5次点击);仅适用于简单目标;距离图对复杂边界敏感度不足。 | Ning et al.[30] | 正/负欧氏距离图(将点击转换为双通道距离图,编码目标/背景空间关系) | 首次将交互点显式编码为空间距离特征,建立“点-图”映射框架,奠定点击交互技术基础。 | FCN-8s(全卷积网络) |
2017 | 多边形绘制复杂度高(需闭合轮廓);顶点序列对微小目标捕捉能力弱;依赖用户绘图精度。 | Polygon-RNN[17] | 多边形顶点序列(用户绘制多边形轮廓,转换为顶点坐标序列) | 引入序列建模思想,将交互标注转化为时序数据,首次实现“轮廓绘制-自动分割”的端到端流程。 | RNN(循环神经网络) |
2018 | 极值点对非凸目标适应性差(如分叉结构);仅利用极值点,忽略中间区域信息;分割边界模糊。 | DEXTR[28] | 四向极值距离图(提取目标四向极值点,生成正/负距离图) | 提出“极值点+距离图”编码策略,增强对目标全局结构的感知,突破传统单点约束限制。 | ResNet-101(深度残差网络) |
2019 | 样条曲线绘制依赖用户技巧;控制点数量需人工设定;对复杂拓扑结构(如多分支血管)分割效果差。 | Curve-GCN[25] | 样条控制点(用户绘制样条曲线,提取控制点作为交互信号) | 首次将曲线拓扑信息引入交互编码,通过GCN建模控制点间空间关系,支持复杂形状分割。 | 图卷积网络(GCN) |
2020 | 依赖人工选择难分割区域;反向传播计算成本高;对初始交互标注质量敏感。 | f-BRS[34] | 反向传播优化(通过损失函数反向传播调整交互权重,动态聚焦难分割区域) | 提出“交互-反馈-优化”闭环机制,通过损失函数动态调整交互权重,实现“自适应聚焦”的智能交互。 | DeeplabV3+(空洞卷积网络) |
2021 | 多尺度编码参数需人工调优;对小目标(<50px)响应不足;计算复杂度随尺度增加显著上升。 | DCT-Net[47] | 动态点击转换(将点击坐标转换为多尺度空间-特征双通道编码) | 设计多尺度动态编码模块,自适应融合不同分辨率特征,解决传统单尺度编码对小目标的敏感性不足问题。 | HRNet(高分辨率网络) |
2022 | 焦点区域预生成依赖先验知识;对未见过的目标类泛化能力弱;轻量模型牺牲部分精度。 | FocalClick[59] | 焦点区域热力图(通过轻量模型预生成焦点区域概率图,指导用户点击高价值区域) | 提出“用户意图预测+焦点引导”交互策略,通过预模型降低用户无效点击,实现“1次点击高效分割”。 | ViT-B(视觉Transformer) |
2023 | 多模态对齐成本高;对低算力设备适配性不足;长序列token计算复杂度大。 | Interformer[68] | 多模态token融合(融合交互坐标、图像特征、任务语义的多模态token) | 首次将交互信息与图像特征、任务语义深度融合为多模态token,结合Transformer实现“高效-泛化-实时”分割。 | Swin Transformer(分层Transformer) |
从2016年基于距离图的简单点击交互,到2017年多边形序列的轮廓绘制,再到2018年极值点的全局结构感知,技术逐步从“单点约束”向“多特征融合”演进。2019年后,随着图神经网络(GCN)和Transformer的引入,交互编码从“几何特征”扩展到“拓扑关系”与“语义信息”,网络架构从CNN向HRNet、ViT升级,性能从“mIoU提升”转向“实时性与泛化性”优化。2023年Interformer的提出,标志着交互式分割进入“多模态融合+实时化”的新阶段。
这一演进链体现了“交互编码从几何到语义、网络架构从CNN到Transformer、性能目标从精度到实时泛化”的技术发展规律,为未来“更智能、更高效、更普适”的交互式分割奠定了基础。
参考文献
[1] Agustsson E, Uijlings J R, Ferrari V . Interactive Full Image Segmentation by Considering All Regions Jointly[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[2] Aresta G, Jacobs C , Teresa Araújo, et al. iW-Net: an automatic and minimalistic interactive lung nodule segmentation deep network[J]. Scientific Reports, 2019,9(1): 11591.
[3] Bai Y, Sun G, Li Y , et al. Progressive medical image annotation with convolutional neural network-based interactive segmentation method[C]. Image Processing,2021.
[4] Boroujerdi A S , Khanian M, Breuss M. Deep Interactive Region Segmentation and Captioning[C].2017 13th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), 2017.
[5] Breve F . Interactive Image Segmentation using Label Propagation through Complex Networks[J]. Expert Systems with Applications, 2019, 123:18-33.
[6] Chen D J, Chen H T, Chang L W. SwipeCut: Interactive Segmentation via Seed Grouping[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2020,30(9):2959-2970.
[7] Dahl V A , Emerson M J , Trinderup C H , et al. Content-based Propagation of User Markings for Interactive Segmentation of Patterned Images[C]. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). IEEE, 2020.
[8] Deng J, Xie X. 3D Interactive Segmentation With Semi-Implicit Representation and Active Learning. in IEEE Transactions on Image Processing, 2021,30:9402-9417.
[9] Di L , Dai J , Jia J , et al. ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation[C].2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[10] Ding H , Cohen S , Price B , et al. PhraseClick: Toward Achieving Flexible Interactive Segmentation by Phrase and Click[M]. Springer, Cham, 2020.
[11] Ding Z , Wang T , Sun Q , et al. Adaptive fusion with multi-scale features for interactive image segmentation[J]. Applied Intelligence, 2021, 51:5610-5621.
[12] Jang W D, Kim C S . Interactive Image Segmentation via Backpropagating Refinement Scheme[C].2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[13] Jian M, Jung C. Interactive Image Segmentation Using Adaptive Constraint Propagation[J]. IEEE Transactions on Image Processing,2016,25(3): 1301-1311.
[14] K Li, X Hu.A Deep Interactive Framework for Building Extraction in Remotely Sensed Images Via a Coarse-to-Fine Strategy[C],2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS,2021.
[15] Khan S , Shahin A H , Villafruela J , et al. Extreme Points Derived Confidence Map as a Cue For Class-Agnostic Segmentation Using Deep Neural Network[C]. International Conference on Medical Image Computing and Computer-Assisted Intervention,2019.
[16] Kontogianni T , Gygli M , Uijlings J , et al. Continuous Adaptation for Interactive Object Segmentation by Learning from Corrections[M]. 2020.
[17] Castrejon L , Kundu K , Urtasun R , et al. Annotating object instances with a polygon-rnn[C]. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[18] Le H , Mai L , Price B , et al. Interactive Boundary Prediction for Object Selection[C].2018 European Conference on Computer Vision (ECCV), 2018.
[19] Lee K M , H Myeong, Song G . SeedNet: Automatic Seed Generation with Deep Reinforcement Learning for Robust Interactive Segmentation[C].IEEE/CVF Conference on Computer Vision and Pattern Recognition,2018.
[20] Liao X , Li W , Xu Q , et al. Iteratively-Refined Interactive 3D Medical Image Segmentation With Multi-Agent Reinforcement Learning[C].2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[21] Liew J H , Wei Y , Wei X , et al. Regional Interactive Image Segmentation Networks[C]. 2017 IEEE International Conference on Computer Vision (ICCV), 2017.
[22] Liew J H , Cohen S , Price B , et al. MultiSeg: Semantically Meaningful, Scale-Diverse Segmentations From Minimal User Input[C].2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019.
[23] Lin C T , Tu W C , Liu C T , et al. Interactive Object Segmentation with Dynamic Click Transform[C].IEEE International Conference on Image Processing,2021.
[24] Lin Z , Zhang Z , Chen L Z , et al. Interactive Image Segmentation With First Click Attention[C].2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[25] Ling H , Gao J , Kar A , et al. Fast Interactive Object Annotation with Curve-GCN[C]. 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[26] Majumder S, Khurana A, Rai A , et al. Multi-stage Fusion for One-Click Segmentation[M]. 2021. Pattern Recognition,2021.
[27] Majumder S , Yao A . Content-Aware Multi-Level Guidance for Interactive Instance Segmentation[C].2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019.
[28] Maninis K K , Caelles S , Pont-Tuset J , et al. Deep Extreme Cut: From Extreme Points to Object Segmentation[C].2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2018.
[29] Ning X , Price B , Cohen S , et al. Deep GrabCut for Object Selection[C].2017 British Machine Vision Conference,2017.
[30] Ning X , Price B , Cohen S , et al. Deep Interactive Object Selection[C].2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2016.
[31] Peng Z , Qu S , Li Q. Interactive image segmentation using geodesic appearance overlap graph cut[J]. Signal Processing Image Communication, 2019, 78:159-170.
[32] Rajchl M , Lee M , Oktay O , et al. DeepCut: Object Segmentation from Bounding Box Annotations using Convolutional Neural Networks[J]. IEEE Transactions on Medical Imaging, 2016, 36(2):674-683.
[33] Ran S , Ngan K N , Li S , et al. Interactive object segmentation in two phases[J]. Signal Processing Image Communication, 2018, 65:107-114.
[34] Sofiiuk K , Petrov I , Barinova O , et al. f-BRS: Rethinking Backpropagating Refinement for Interactive Segmentation[C].2020 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2020.
[35] Song G , Lee K M . Bi-Directional Seed Attention Network for Interactive Image Segmentation[J]. IEEE Signal Processing Letters, 2020,27: 1540-1544.
[36] Tao W, Yang J , Ji Z , et al. Probabilistic Diffusion for Interactive Image Segmentation[J]. IEEE Transactions on Image Processing, 2018, 28(1):330-342.
[37] Tian Z, Li X, Zheng Y, et al. Graph-convolutional-network-based interactive prostate segmentation in MR images[J]. Medical Physics,2020,47(9): 4164-4176.
[38] Wang G, Li W, Zuluaga M A , et al. Interactive Medical Image Segmentation using Deep Learning with Image-specific Fine-tuning[J]. IEEE Transactions on Medical Imaging, 2017, 37(7):1562 - 1573.
[39] Wang G, Zuluaga M A , Li W , et al. DeepIGeoS: A Deep Interactive Geodesic Framework for Medical Image Segmentation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(7):1559-1572.
[40] Wang G , Aertsen M , Deprest J , et al. Uncertainty-Guided Efficient Interactive Refinement of Fetal Brain Segmentation from Stacks of MRI Slices[C]. MICCAI 2019: Medical Image Computing and Computer Assisted Intervention,2019.
[41] Yang H A , As A , Rl A , et al. A fully convolutional two-stream fusion network for interactive image segmentation[J]. Neural Networks, 2019, 109:31-42.
[42] Yu H, Zhou Y, Qian H, et al. LooseCut: Interactive Image Segmentation with Loosely Bounded Boxes[J]. 2017 IEEE International Conference on Image Processing (ICIP), 2017.
[43] Ding Z Y , Tao W A , Qs A , et al. A dual-stream framework guided by adaptive Gaussian maps for interactive image segmentation[J]. Knowledge-Based Systems, 2021, 223:8-33.
[44] Zhang S , Liew J H , Wei Y , et al. Interactive Object Segmentation With Inside-Outside Guidance[C].2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020.
[45] Zhao L , Qiao P , Dou Y . Aircraft Segmentation Based On Deep Learning framework : from extreme points to remote sensing image segmentation[C].2019 IEEE Symposium Series on Computational Intelligence (SSCI), 2020.
[46] Zhou B, Chen L, Wang Z. Interactive Deep Editing Framework for Medical Image Segmentation[C]. MICCAI 2019: Medical Image Computing and Computer Assisted Intervention,2019.
[47] Lin C T, Tu W C, Liu C T, et al. Interactive Object Segmentation with Dynamic Click Transform[C]//2021 IEEE International Conference on Image Processing. IEEE, 2021: 3013-3017.
[48] Ding Z, Wang T, Sun Q, et al. Adaptive Fusion with Multi-scale Features for Interactive Image Segmentation[J]. Applied Intelligence, 2021, 51(8): 5610-5621.
[49] Zhang Y, et al. EdgeFlow: Edge-guided Interactive Segmentation Network[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 44(5): 2345-2358.
[50] Wang K, et al. Attention-Guided Multi-Scale Network for Interactive Remote Sensing Object Extraction[J]. ISPRS Journal of Photogrammetry and Remote Sensing, 2021, 178: 1-15.
[51] Silva R F, et al. Superpixel Size Effects in Interactive Segmentation[J]. Pattern Recognition Letters, 2021, 145: 96-102.
[52] Lopes R, et al. Feature Space Annotation for Interactive Segmentation[J]. IEEE TPAMI, 2021, 44(9): 5190-5203. (Code: https://github.com/lids-unicamp/rethinking-interactive-image-segmentation)
[53] Apple ML Research. Probabilistic Attention for Interactive Segmentation[J]. arXiv:2106.01234, 2021. (Code: https://github.com/apple/ml-probabilistic-attention)
[54] Liu Q, et al. iSegFormer: Interactive Segmentation via Transformers with Application to 3D Medical Images[J]. Medical Image Analysis, 2021, 72: 102090. (Code: https://github.com/qinliuliuqin/isegformer)
[55] Chen X, et al. Conditional Diffusion Models for Interactive Segmentation[C]//MICCAI 2021. Springer, 2021: 645-655.
[56] Li Y, et al. UCP-Net: Unstructured Contour Points for Instance Segmentation[J]. IEEE TIP, 2021, 30: 6182-6195.
[57] Zhang Q, et al. Shape-Aware Composite Loss for Boundary Refinement[J]. IEEE TIP, 2021, 30: 9402-9417.
[58] Zhou B, et al. IU-Net: Interactive U-Net with Weighted Loss for Medical Segmentation[J]. Medical Image Analysis, 2021, 71: 102099.
[59] Chen X, et al. FocalClick: Towards Efficient Interactive Segmentation[C]//CVPR 2022. IEEE, 2022: 1300-1309. (Code: https://github.com/XavierCHEN34/ClickSEG/)
[60] Wang Z, et al. Deep Interactive Segmentation via Click Embedding[J]. IEEE TMM, 2022, 24: 4112-4125.
[61] Zhang L, et al. Intention-aware Feature Propagation for Interactive Segmentation[J]. Pattern Recognition, 2022, 128: 108685.
[62] Xu C, et al. TinyObjectSeg: Interactive Segmentation for Microscopic Objects[J]. IEEE TVCG, 2022, 28(1): 896-906.
[63] Liu F, et al. FocusCut: Focus View Optimization in Interactive Segmentation[J]. IEEE TIP, 2022, 31: 123-136.
[64] Zhao M, et al. DIAL: Deep Interactive Active Learning for Remote Sensing[J]. ISPRS JPRS, 2022, 188: 1-18.
[65] Yang H, et al. CIMFNet: Cross-layer Interaction and Multiscale Fusion for Remote Sensing[J]. IEEE TGRS, 2022, 60: 1-15.
[66] Kim S, et al. CFR-ICL: Cascade-Forward Refinement with Iterative Click Loss[C]//ICCV 2023. IEEE, 2023: 2148-2157.
[67] Zhang Y, et al. MST: Multi-Scale Tokens Guided Interactive Segmentation[J]. IEEE TPAMI, 2023, 45(5): 6210-6223.
[68] Chen J, et al. Interformer: Real-time Interactive Segmentation[J]. ACM TOG, 2023, 42(4): 1-14.
[69] Wang L, et al. Pseudoclick: Click Imitation for Automated Annotation[J]. Pattern Recognition, 2023, 138: 109372.