CaltechAUTHORS
  A Caltech Library Service

An overview of the BioCreative 2012 Workshop Track III: interactive text mining task

Arighi, Cecilia N. and Van Auken, Kimberly and Li, Yuling and Chan, Juancarlos and Muller, Hans-Michael (2013) An overview of the BioCreative 2012 Workshop Track III: interactive text mining task. Database : The Journal of Biological Databases and Curation, 2013 . Art. No. bas056 . ISSN 1758-0463. PMCID PMC3625048. http://resolver.caltech.edu/CaltechAUTHORS:20130410-154721740

[img]
Preview
PDF - Published Version
See Usage Policy.

457Kb
[img] Other - Supplemental Material
See Usage Policy.

15Kb

Use this Persistent URL to link to this item: http://resolver.caltech.edu/CaltechAUTHORS:20130410-154721740

Abstract

In many databases, biocuration primarily involves literature curation, which usually involves retrieving relevant articles, extracting information that will translate into annotations and identifying new incoming literature. As the volume of biological literature increases, the use of text mining to assist in biocuration becomes increasingly relevant. A number of groups have developed tools for text mining from a computer science/linguistics perspective, and there are many initiatives to curate some aspect of biology from the literature. Some biocuration efforts already make use of a text mining tool, but there have not been many broad-based systematic efforts to study which aspects of a text mining tool contribute to its usefulness for a curation task. Here, we report on an effort to bring together text mining tool developers and database biocurators to test the utility and usability of tools. Six text mining systems presenting diverse biocuration tasks participated in a formal evaluation, and appropriate biocurators were recruited for testing. The performance results from this evaluation indicate that some of the systems were able to improve efficiency of curation by speeding up the curation task significantly (∼1.7- to 2.5-fold) over manual curation. In addition, some of the systems were able to improve annotation accuracy when compared with the performance on the manually curated set. In terms of inter-annotator agreement, the factors that contributed to significant differences for some of the systems included the expertise of the biocurator on the given curation task, the inherent difficulty of the curation and attention to annotation guidelines. After the task, annotators were asked to complete a survey to help identify strengths and weaknesses of the various systems. The analysis of this survey highlights how important task completion is to the biocurators’ overall experience of a system, regardless of the system’s high score on design, learnability and usability. In addition, strategies to refine the annotation guidelines and systems documentation, to adapt the tools to the needs and query types the end user might have and to evaluate performance in terms of efficiency, user interface, result export and traditional evaluation metrics have been analyzed during this task. This analysis will help to plan for a more intense study in BioCreative IV.


Item Type:Article
Related URLs:
URLURL TypeDescription
http://dx.doi.org/10.1093/database/bas056DOIArticle
http://database.oxfordjournals.org/content/2013/bas056PublisherArticle
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3625048/PubMed CentralArticle
ORCID:
AuthorORCID
Van Auken, Kimberly0000-0002-1706-4196
Additional Information:© 2013 Published by Oxford University Press on behalf of US Government. Submitted 10 July 2012; Revised 27 November 2012; Accepted 28 November 2012. National Science Foundation grant DBI-0850319; National Institutes of Health grant 5G08LM010720-02; The participation of Z.L. and W.J.W. was supported by the Intramural Research Program of the National Institutes of Health, National Library of Medicine. The participation of M.K was supported by CONSOLIDER grant CSD2007-00050 and MICROME grant 222886-2.
Funders:
Funding AgencyGrant Number
NSFDBI-0850319
NIH5G08LM010720-02
NIH Intramural Research ProgramUNSPECIFIED
Ministerio de Ciencia e Innovación (MCINN)CSD2007-00050
MICROME grant222886-2
PubMed Central ID:PMC3625048
Record Number:CaltechAUTHORS:20130410-154721740
Persistent URL:http://resolver.caltech.edu/CaltechAUTHORS:20130410-154721740
Official Citation:Arighi,C.N., Carterette,B., Cohen,K.B., et al. An overview of the BioCreative 2012 Workshop Track III: interactive text mining. Database (2012) Vol. 2012: article ID bas056; doi:10.1093/database/bas056
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:37873
Collection:CaltechAUTHORS
Deposited By: Jason Perez
Deposited On:11 Apr 2013 14:37
Last Modified:30 Oct 2017 20:11

Repository Staff Only: item control page