SurgiTrack:外科手术视频中的细粒度多类别多工具跟踪|文献速递-视觉大模子
Title
标题
SurgiTrack: Fine-grained multi-class multi-tool tracking in surgical videos
SurgiTrack:外科手术视频中的细粒度多类别多工具跟踪
01
文献速递介绍
手术器械跟踪在盘算机辅助手术体系中发挥着至关重要的作用,可为一系列应用提供有代价的支持,包括技能评估(Pedrett 等人,2023)、视觉伺服(Xu 等人,2023)、导航(Xu 等人,2022)、腹腔镜定位(Dutkiewicz 等人,2005)、安全和风险地区评估(Richa 等人,2011)以及加强现实(Martin-Gomez 等人,2023)。相比于仅在单帧图像中识别目标器械的器械检测,器械跟踪更进一步,还包括在视频后续帧中对器械位置的估计和猜测。
传统的器械跟踪依赖于基于颜色、纹理、SIFT 和几何特征的传统机器学习方法(Pezzementi 等人,2009;Sznitman 等人,2012;Alsheakhali 等人,2015;Dockter 等人,2014;Du 等人,2016)。近年来,深度学习的进展(Bouget 等人,2017;Lee 等人,2019;Nwoye 等人,2019;Zhao 等人,2019a,b;Robu 等人,2021;Nwoye,2021;Fathollahi 等人,2022;Wang 等人,2022;Rueckert 等人,2023)引领了一个新时代,使得可以提取更具鲁棒性的特征来实现器械重新识别(re-ID)。尽管取得了显著进展,但仍存在诸多挑战。现有研究重要集中在单器械跟踪(Zhao 等人,2019b)、单类别多器械跟踪(Fathollahi 等人,2022)或多类别单器械跟踪(Nwoye 等人,2019)。然而,在实际手术场景中,通常会同时使用多个类别的器械,这须要多类别多器械的跟踪,这一领域因缺乏须要的数据集而未得到充分探索。近来,一个名为 CholecTrack20 的新数据集(Nwoye 等人,2023)被引入,为多类别多器械跟踪提供了所需的支持。该数据集还定义了三种不同的轨迹视角:(1) 器械在手术过程中的全生命周期,(2) 器械在体内的循环过程,以及 (3) 器械在摄像机视野内的可见时长(如图 1 所示)。同时在这三种视角下跟踪器械被称为多视角跟踪。CholecTrack20 数据集提供了丰富的多视角跟踪标注,可顺应多样化的手术需求,但迄今为止尚未有深度学习模子在该数据集上用于自动器械跟踪。为开发一种实用于手术视频中多视角多类别多器械跟踪的方法,我们首先在 CholecTrack20 数据集上对 10 种最先辈的检测方法进行基准测试,并对实用于手术领域的 re-ID 方法进行了广泛的消融研究。re-ID 模块在管理手术视频中器械身份的时间同等性方面起着关键作用。然而,由于器械的复杂活动模式、频繁遮挡以及手术场景中有限的视野范围,挑战依然存在。特别是当多个同类器械实例具有雷同的外观特征时,在器械被遮挡、移出摄像机视野或重新插入手术场景后重新识别它们是一项困难的任务。
与现有方法不同,我们的初步实行表明,仅依赖器械外观线索进行轨迹区分并不理想,尤其是在区分同一类别的实例时。为了解决这一问题,我们引入了领域知识,特别是器械的使用模式和器械操作员的信息。后者,即器械操作员,指的是操作器械的外科医生的手部动作,在区分同类器械实例时比外观特征更为正确。然而,手术内镜图像中并未直接观察到操作员信息,这使得其自动猜测成为一项挑战。受到这些发现的启发,我们提出了一种名为 SurgiTrack 的新型深度学习方法用于手术器械跟踪。SurgiTrack 将器械操作员的动作近似为器械的起始方向,并采用注意力机制对器械活动方向进行编码,有效模拟不可见的外科医生手部动作或穿刺点的位置,用于器械重新识别。我们的模子设计答应方向估计器在没有操作员标签的数据集上进行自监督学习,其性能可与有监督方法相媲美。这一技术确保了我们的方法可以在缺乏操作员标签的手术数据集上进行探索。此外,为了应对器械轨迹的多视角特性,我们的网络通过调和的二分匹配图算法关联轨迹。该算法除了通例的线性分配外,还解决了跨视角轨迹的身份冲突问题,并在总体上进步了轨迹身份重新分配的正确性。
总结而言,我们的贡献包括以下几点:正式化了多视角器械跟踪建模,并在 CholecTrack20 数据集上对最先辈方法进行了基准测试。开发了依赖于基于自监督注意力的活动方向估计和调和二分图匹配的 SurgiTrack 模子用于器械跟踪。对不同轨迹视角下的器械跟踪进行了广泛评估,涵盖不同的视频帧率以及诸如出血、烟雾和遮挡等各种视觉挑战。
这些贡献共同推动了手术器械跟踪领域的研究,促进了盘算机辅助手术体系和人工智能干预技术的进一步发展。
Abatract
摘要
Accurate tool tracking is essential for the success of computer-assisted intervention. Previous efforts oftenmodeled tool trajectories rigidly, overlooking the dynamic nature of surgical procedures, especially trackingscenarios like out-of-body and out-of-camera views. Addressing this limitation, the new CholecTrack20 datasetprovides detailed labels that account for multiple tool trajectories in three perspectives: (1) intraoperative, (2)intracorporeal, and (3) visibility, representing the different types of temporal duration of tool tracks. Thesefine-grained labels enhance tracking flexibility but also increase the task complexity. Re-identifying tools afterocclusion or re-insertion into the body remains challenging due to high visual similarity, especially amongtools of the same category. This work recognizes the critical role of the tool operators in distinguishingtool track instances, especially those belonging to the same tool category. The operators’ information arehowever not explicitly captured in surgical videos. We therefore propose SurgiTrack, a novel deep learningmethod that leverages YOLOv7 for precise tool detection and employs an attention mechanism to model theoriginating direction of the tools, as a proxy to their operators, for tool re-identification. To handle diverse tooltrajectory perspectives, SurgiTrack employs a harmonizing bipartite matching graph, minimizing conflicts andensuring accurate tool identity association. Experimental results on CholecTrack20 demonstrate SurgiTrack’seffectiveness, outperforming baselines and state-of-the-art methods with real-time inference capability. Thiswork sets a new standard in surgical tool tracking, providing dynamic trajectories for more adaptable andprecise assistance in minimally invasive surgeries.
正确的器械跟踪是盘算机辅助干预成功的关键。然而,之前的研究每每刚性地建模器械轨迹,忽略了手术过程中动态变革的特性,特别是在器械离体或离开摄像头视野的跟踪场景中。为了解决这一局限,新的 CholecTrack20 数据集提供了详细的标签,涵盖了多种器械轨迹的三种视角:(1) 手术操作视角,(2) 腔内视角,(3) 可见性视角,代表了器械轨迹的不同时间维度。这些细粒度的标签进步了跟踪的灵活性,但也增加了任务的复杂性。由于器械类别内部的高视觉相似性,特别是在遮挡或重新插入体内后重新识别器械时,仍然面对挑战。
本研究认识到器械操作员在区分属于同一器械类别的轨迹实例中的关键作用。然而,手术视频中并未明确捕捉操作员的信息。因此,我们提出了一种名为 SurgiTrack 的新型深度学习方法,该方法使用 YOLOv7 实现精确的器械检测,并采用注意力机制对器械的发源方向(作为操作员的代理)进行建模,以实现器械重新识别。为了应对多样化的器械轨迹视角,SurgiTrack 采用了一种调和的二分匹配图算法,最大限度地减少冲突,确保器械身份的正确关联。CholecTrack20 数据集上的实行结果表明,SurgiTrack 具有出色的结果,在及时推理能力下,性能优于基线和最新的先辈方法。本研究为手术器械跟踪设立了新的尺度,提供了动态轨迹,为微创手术提供了更加灵活和精确的辅助。
Method
方法
We present SurgiTrack, a deep learning method for surgical tooltracking based on tool direction of motion features. SurgiTrack is designed as a multi-class multi-object tracking (MCMOT) model capableof tracking tools jointly across multiple trajectory perspectives, namelyvisibility, intracorporeal, and intraoperative. The motivation to trackbeyond camera’s field of view is to offer more flexible trajectories thatensure continuous and reliable identification of surgical tools, tailoredto the complex dynamics of a surgical scene, preventing errors andmaintaining safety even when tools temporarily move out of view.The architecture of our proposed tracking model is conceptuallydivided into the main components of object tracking: spatial detectionand data association, with the later further split into re-identificationfeature modeling and track identity matching, as illustrated in Fig. 3(a).
我们提出了一种名为 SurgiTrack 的深度学习方法,用于基于器械活动方向特征的手术器械跟踪。SurgiTrack 设计为一种多类别多目标跟踪(MCMOT)模子,能够在多个轨迹视角下对器械进行联合跟踪,包括可见性、腔内轨迹和手术全程轨迹。将跟踪扩展到摄像机视野之外的动机是提供更灵活的轨迹,以确保手术器械的连续可靠识别,顺应手术场景中的复杂动态,即使在器械暂时移出视野时,也能防止错误并维护安全。我们提出的跟踪模子的架构从概念上分为目标跟踪的重要构成部分:空间检测和数据关联。数据关联进一步细分为重新识别特征建模和轨迹身份匹配,如图 3(a) 所示。
Conclusion
结论
In this work, we propose, SurgiTrack, a novel deep learning approach for multi-class multi-tool tracking in surgical videos. Our approach utilizes an attention-based deep learning model for tool identityassociation by learning the tool motion direction which we conceivedas a proxy to linking the tools to the operating surgeons’ hands via thetrocars. We demonstrate that the motion direction features are superiorto location, appearance, and similarity features for the re-identificationof surgical tools given the non-distinctiveness of most tools’ appearance, especially the ones from the same or similar classes. We show thatthe direction features can be learnt in 3 different paradigms of full-,weak-, and self-supervision depending on the availability of traininglabels. We also design a harmonizing bipartite matching graph toenable non-conflicting and synchronized tracking of tools across threeperspectives of intraoperative, intracorporeal, and visibility within thecamera field of view, which represent the various ways of consideringthe temporal duration of a tool trajectory. Additionally, we benchmarkseveral deep learning methods for tool detection and tracking on thenewly introduced CholecTrack20 dataset and conducted ablation studies on the suitability of existing re-identification features for accuratetool tracking. Our proposed model emerges as a promising solutionfor multi-class multi-tool tracking in surgical procedures, showcasingadaptability across different training paradigms and demonstratingstrong performance in essential tracking metrics. We also evaluateour model across different surgical visual challenges such as bleeding,smoke, occlusion, camera fouling, light reflection, etc., and presentsinsightful findings on their impact on visual tracking in surgical videos.Qualitative results also show that our method is effective in handlingchallenging situations compare to the baselines and can effortlesslytrack tools irrespective of the video frame sampling rate.
在本研究中,我们提出了一种新颖的深度学习方法 SurgiTrack,用于手术视频中的多类别多器械跟踪。我们的方法采用基于注意力的深度学习模子,通过学习器械活动方向来实现器械身份关联。我们将这种活动方向视为将器械与通过穿刺器操作的外科医生手部动作连接的代理。研究表明,活动方向特征在器械重新识别中优于位置、外观和相似性特征,特别是在外观相似度较高的同类或相似类别器械中。我们展示了可以通过全监督、弱监督和自监督三种不同的学习范式学习方向特征,具体取决于练习标签的可用性。此外,我们设计了一种调和的二分匹配图算法,以实现器械在手术全程轨迹、腔内轨迹和摄像机视野内可见性三个视角下的无冲突且同步的跟踪,这些视角分别代表了器械轨迹的不同时间维度。我们还在新引入的 CholecTrack20 数据集上对多种深度学习方法进行了器械检测和跟踪的基准测试,并对现有重新识别特征在正确器械跟踪中的实用性进行了消融研究。结果表明,SurgiTrack 在手术操作中的多类别多器械跟踪方面是一种极具潜力的解决方案,显现了其在不同练习范式下的顺应性,并在关键跟踪指标中体现出色。
此外,我们评估了模子在不同手术视觉挑战(如出血、烟雾、遮挡、摄像头污损、光反射等)下的体现,并揭示了这些因素对手术视频中视觉跟踪的影响。定性结果表明,与基线方法相比,SurgiTrack 能够有效应对复杂场景,并能在不同视频帧采样率下轻松跟踪器械。
Results
结果
First, our base detector, YOLOv7 (Wang et al., 2023a), yields80.6%
页:
[1]