A Caltech Library Service

Validation of Machine Learning-Based Automated Surgical Instrument Annotation Using Publicly Available Intraoperative Video

Markarian, Nicholas and Kugener, Guillaume and Pangal, Dhiraj J. and Unadkat, Vyom and Sinha, Aditya and Zhu, Yichao and Roshannai, Arman and Chan, Justin and Hung, Andrew J. and Wrobel, Bozena B. and Anandkumar, Animashree and Zada, Gabriel and Donoho, Daniel A. (2022) Validation of Machine Learning-Based Automated Surgical Instrument Annotation Using Publicly Available Intraoperative Video. Operative Neurosurgery, 23 (3). pp. 235-240. ISSN 2332-4252. doi:10.1227/ons.0000000000000274.

Full text is not posted in this repository. Consult Related URLs below.

Use this Persistent URL to link to this item:


BACKGROUND: Intraoperative tool movement data have been demonstrated to be clinically useful in quantifying surgical performance. However, collecting this information from intraoperative video requires laborious hand annotation. The ability to automatically annotate tools in surgical video would advance surgical data science by eliminating a time-intensive step in research. OBJECTIVE: To identify whether machine learning (ML) can automatically identify surgical instruments contained within neurosurgical video. METHODS: A ML model which automatically identifies surgical instruments in frame was developed and trained on multiple publicly available surgical video data sets with instrument location annotations. A total of 39 693 frames from 4 data sets were used (endoscopic endonasal surgery [EEA] [30 015 frames], cataract surgery [4670], laparoscopic cholecystectomy [2532], and microscope-assisted brain/spine tumor removal [2476]). A second model trained only on EEA video was also developed. Intraoperative EEA videos from YouTube were used for test data (3 videos, 1239 frames). RESULTS: The YouTube data set contained 2169 total instruments. Mean average precision (mAP) for instrument detection on the YouTube data set was 0.74. The mAP for each individual video was 0.65, 0.74, and 0.89. The second model trained only on EEA video also had an overall mAP of 0.74 (0.62, 0.84, and 0.88 for individual videos). Development costs were $130 for manual video annotation and under $100 for computation. CONCLUSION: Surgical instruments contained within endoscopic endonasal intraoperative video can be detected using a fully automated ML model. The addition of disparate surgical data sets did not improve model performance, although these data sets may improve generalizability of the model in other use cases.

Item Type:Article
Related URLs:
URLURL TypeDescription Article
Kugener, Guillaume0000-0002-4697-2847
Pangal, Dhiraj J.0000-0001-7391-9825
Hung, Andrew J.0000-0002-7201-6736
Anandkumar, Animashree0000-0002-6974-6797
Zada, Gabriel0000-0001-5821-902X
Donoho, Daniel A.0000-0002-0531-1436
Issue or Number:3
Record Number:CaltechAUTHORS:20220908-194215690
Persistent URL:
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:116626
Deposited By: Donna Wrublewski
Deposited On:07 Sep 2022 22:58
Last Modified:23 May 2023 21:05

Repository Staff Only: item control page