Semantic Segmentation in Surgery

Being able to identify different semantic regions, such as various types of tissues, organs, and instruments, in surgical scenes is critical for the robot to understand the current progress of the surgery, assess potential risks, and plan the next steps. Surgical scenes present many challenges, including blood, smoke, motion blur, and varying lighting conditions.

HemoSet: The First Blood Segmentation Dataset for Automation of Hemostasis Management 2024

  • Hemorrhaging occurs in surgeries of all types, forcing surgeons to quickly adapt to the visual interference that results from blood rapidly filling the surgical field. Intro-ducing automation into the crucial surgical task of hemostasis management would offload mental and physical tasks from the surgeon and surgical assistants while simultaneously increasing the efficiency and safety of the operation. The first step in automation of hemostasis management is detection of blood in the surgical field. To propel the development of blood detection algorithms in surgeries, we present HemoSet, the first blood segmentation dataset based on bleeding during a live animal robotic surgery. Our dataset features vessel hemorrhage scenarios where turbulent flow leads to abnormal pooling geometries in surgical fields. These pools are formed in conditions endemic to surgical procedures - uneven heterogeneous tissue, under glossy lighting conditions and rapid tool movement. We benchmark several state-of-the-art segmentation models and provide insight into the difficulties specific to blood detection. We intend for HemoSet to spur development of autonomous blood suction tools by providing a platform for training and refining blood segmentation models, addressing the precision needed for such robotics.

Reducing Annotating Load: Active Learning with Synthetic Images in Surgical Instrument Segmentation 2024

  • Accurate instrument segmentation in the endoscopic vision of minimally invasive surgery is challenging due to complex instruments and environments. Deep learning techniques have shown competitive performance in recent years. However, deep learning usually requires a large amount of labeled data to achieve accurate prediction, which poses a significant workload. To alleviate this workload, we propose an active learning-based framework to generate synthetic images for efficient neural network training. In each active learning iteration, a small number of informative unlabeled images are first queried by active learning and manually labeled. Next, synthetic images are generated based on these selected images. The instruments and backgrounds are cropped out and randomly combined with blending and fusion near the boundary. The proposed method leverages the advantage of both active learning and synthetic images. The effectiveness of the proposed method is validated on two sinus surgery datasets and one intraabdominal surgery dataset. The results indicate a considerable performance improvement, especially when the size of the annotated dataset is small.

Multi-frame Feature Aggregation for Real-time Instrument Segmentation in Endoscopic Video 2021

  • Deep learning-based methods have achieved promising results on surgical instrument segmentation. However, the high computation cost may limit the application of deep models to time-sensitive tasks such as online surgical video analysis for robotic-assisted surgery. Moreover, current methods may still suffer from challenging conditions in surgical images such as various lighting conditions and the presence of blood. We propose a novel Multi-frame Feature Aggregation (MFFA) module to aggregate video frame features temporally and spatially in a recurrent mode. By distributing the computation load of deep feature extraction over sequential frames, we can use a lightweight encoder to reduce the computation costs at each time step. Moreover, public surgical videos usually are not labeled frame by frame, so we develop a method that can randomly synthesize a surgical frame sequence from a single labeled frame to assist network training. We demonstrate that our approach achieves superior performance to corresponding deeper segmentation models on two public surgery datasets.

Towards Better Surgical Instrument Segmentation in Endoscopic Vision: Multi-Angle Feature Aggregation and Contour Supervision 2020

  • Accurate and real-time surgical instrument segmentation is important in the endoscopic vision of robot-assisted surgery, and significant challenges are posed by frequent instrument-tissue contacts and continuous change of observation perspective. For these challenging tasks more and more deep neural networks (DNN) models are designed in recent years. We are motivated to propose a general embeddable approach to improve these current DNN segmentation models without increasing the model parameter number. Firstly, observing the limited rotation-invariance performance of DNN, we proposed the Multi-Angle Feature Aggregation (MAFA) method, leveraging active image rotation to gain richer visual cues and make the prediction more robust to instrument orientation changes. Secondly, in the end-to-end training stage, the auxiliary contour supervision is utilized to guide the model to learn the boundary awareness, so that the contour shape of segmentation mask is more precise. The proposed method is validated with ablation experiments on the novel Sinus-Surgery datasets collected from surgeons' operations, and is compared to the existing methods on a public dataset collected with a da Vinci Xi Robot.

LC-GAN: Image-to-Image Translation Based on Generative Adversarial Network for Endoscopic Images 2020

  • Intelligent vision is appealing in computer-assisted and robotic surgeries. Vision-based analysis with deep learning usually requires large labeled datasets, but manual data labeling is expensive and time-consuming in medical problems. We investigate a novel cross-domain strategy to reduce the need for manual data labeling by proposing an image-to-image translation model live-cadaver GAN (LC-GAN) based on generative adversarial networks (GANs). We consider a situation when a labeled cadaveric surgery dataset is available while the task is instrument segmentation on an unlabeled live surgery dataset. We train LC-GAN to learn the mappings between the cadaveric and live images. For live image segmentation, we first translate the live images to fake-cadaveric images with LC-GAN and then perform segmentation on the fake-cadaveric images with models trained on the real cadaveric dataset. The proposed method fully makes use of the labeled cadaveric dataset for live image segmentation without the need to label the live dataset. LC-GAN has two generators with different architectures that leverage the deep feature representation learned from the cadaveric image based segmentation task. Moreover, we propose the structural similarity loss and segmentation consistency loss to improve the semantic consistency during translation. Our model achieves better image-to-image translation and leads to improved segmentation performance in the proposed cross-domain segmentation task.

Video-Based Automatic and Objective Endoscopic Sinus Surgery Skill Assessment 2020

  • There is an increasing need for automatic and objective surgical skill assessment methods to make the surgeon training process more effective. We studied an automatic skill assessment system based on the surgical instrument tip trajectories extracted from cadaveric trans-nasal endoscopic sinus surgery videos. We proposed a tracking algorithm by combining a segmentation-based instrument tip detector and Kalman filter. For surgical skill assessment, we explored four new motion-related metrics. The proposed method has been tested with 10 surgery videos from 4 experts and 5 trainees and shown its potential for the automatic surgical skill assessment.

Automatic Sinus Surgery Skill Assessment Based on Instrument Segmentation and Tracking in Endoscopic Video 2019

  • Current surgical skill assessment mainly relies on evaluations by senior surgeons, a tedious process influenced by subjectivity. The contradiction between a growing number of surgical techniques and the duty-hour limits for residents leads to an increasing need for effective surgical skill assessment. In this paper, we explore an automatic surgical skill assessment method by tracking and analyzing the surgery trajectories in a new dataset of endoscopic cadaveric trans-nasal sinus surgery videos. The tracking is performed by combining the deep convolutional neural network based segmentation and the dense optical flow algorithm. Then the heat maps and motion metrics of the tip trajectories are extracted and analyzed. The proposed method has been tested in 10 endoscopic videos of sinus surgery performed by 4 expert and 5 novice surgeons, showing the potential for the automatic surgical skill assessment.

Surgical Perception

Semantic Segmentation in Surgery

INR for Deformation

Surgical Task Automation

Robot Pose Tracking

One-Shot Perception

Previous
Previous

Robot Pose Tracking

Next
Next

Surgical Task Automation