A Caltech Library Service

A Semi-automatic Indexing Pipeline for Medical Document Retrieval in Resource-constrained Settings

Davison, Stephen and Avgil, Dana and Li, Yan and Yang, Sonia (2022) A Semi-automatic Indexing Pipeline for Medical Document Retrieval in Resource-constrained Settings. In: Twenty-eighth Americas Conference on Information Systems, 10-14 August 2022, Minneapolis, MN.

Full text is not posted in this repository. Consult Related URLs below.

Use this Persistent URL to link to this item:


Medical document indexing can benefit from both automation and human feedback. This research develops a semi-automatic indexing pipeline (SIP) for medical document retrieval in resource-constrained settings. The SIP includes an affordable and efficient automated process for preparing and indexing continuing medical education documents and a human feedback loop to validate recommended terms. It leverages pre-trained Named-entity Recognition models to identify appropriate terms from the MeSH vocabulary and higher-level subject terms from UMLS. The SIP achieved a precision of 59%, a recall of 64%, and an F1 score of 61% based on the expert evaluation of 124 distinct medical documents. The combination of automation with a human expert feedback loop demonstrates a model strategy for an affordable and practical approach to document indexing in resource-limited yet critical services. The SIP may be extended to other environments and information sources to improve the efficiency and accuracy of information retrieval.

Item Type:Conference or Workshop Item (Paper)
Related URLs:
URLURL TypeDescription
Davison, Stephen0000-0003-0102-8200
Li, Yan0000-0002-0415-0140
Contact Email
Additional Information:© 2022, the Author(s). This material is brought to you by the Americas Conference on Information Systems (AMCIS) at AIS Electronic Library (AISeL). It has been accepted for inclusion in AMCIS 2022 Proceedings by an authorized administrator of AIS Electronic Library (AISeL).
Subject Keywords:Medical document indexing, Human feedback loop, Natural Language Processing, Named-entity Recognition
Record Number:CaltechAUTHORS:20220818-172653986
Persistent URL:
Official Citation:Davison, Stephen; Avgil, Dana; Li, Yan; and Yang, Sonia, "A Semi-automatic Indexing Pipeline for Medical Document Retrieval in Resource-constrained Settings" (2022). AMCIS 2022 Proceedings. 4.
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:116358
Deposited By: Stephen Davison
Deposited On:19 Aug 2022 18:34
Last Modified:19 Aug 2022 18:34

Repository Staff Only: item control page