A Caltech Library Service

Overview of the gene ontology task at BioCreative IV

Mao, Yuqing and Van Auken, Kimberly (2014) Overview of the gene ontology task at BioCreative IV. Database : The Journal of Biological Databases and Curation, 2014 . Art. No. bau 086. ISSN 1758-0463. PMCID PMC4142793.

PDF - Published Version
See Usage Policy.


Use this Persistent URL to link to this item:


Gene Ontology (GO) annotation is a common task among model organism databases (MODs) for capturing gene function data from journal articles. It is a time-consuming and labor-intensive task, and is thus often considered as one of the bottlenecks in literature curation. There is a growing need for semiautomated or fully automated GO curation techniques that will help database curators to rapidly and accurately identify gene function information in full-length articles. Despite multiple attempts in the past, few studies have proven to be useful with regard to assisting real-world GO curation. The shortage of sentence-level training data and opportunities for interaction between text-mining developers and GO curators has limited the advances in algorithm development and corresponding use in practical circumstances. To this end, we organized a text-mining challenge task for literature-based GO annotation in BioCreative IV. More specifically, we developed two subtasks: (i) to automatically locate text passages that contain GO-relevant information (a text retrieval task) and (ii) to automatically identify relevant GO terms for the genes in a given article (a concept-recognition task). With the support from five MODs, we provided teams with >4000 unique text passages that served as the basis for each GO annotation in our task data. Such evidence text information has long been recognized as critical for text-mining algorithm development but was never made available because of the high cost of curation. In total, seven teams participated in the challenge task. From the team results, we conclude that the state of the art in automatically mining GO terms from literature has improved over the past decade while much progress is still needed for computer-assisted GO curation. Future work should focus on addressing remaining technical challenges for improved performance of automatic GO concept recognition and incorporating practical benefits of text-mining tools into real-world GO annotation.

Item Type:Article
Related URLs:
URLURL TypeDescription CentralArticle
Van Auken, Kimberly0000-0002-1706-4196
Additional Information:© 2014 Oxford University Press. This work is written by US Government employees and is in the public domain in the US. Received 10 February 2014; Revised 28 July 2014; Accepted 29 July 2014. Published online August 25, 2014. The authors would like to thank Lynette Hirschman, John Wilbur, Cathy Wu, Kevin Cohen, Martin Krallinger and Thomas Wiegers from the BioCreative IV organizing committee for their support, and Judith Blake, Andrew Chatr-aryamontri, Sherri Matis, Fiona McCarthy, Sandra Orchard and Phoebe Roberts from the BioCreative IV User Advisory Group for their helpful discussions. This research is supported by NIH Intramural Research Program, National Library of Medicine (Y.M. and Z.L.). The BioCreative IV Workshop is funded by NSF/DBI-0850319. WormBase is funded by National Human Genome Research Institute [U41-HG002223] and the Gene Ontology Consortium by National Human Genome Research Institute (NHGRI) [U41-HG002273]. FlyBase is funded by an NHGRI/NIH grant [U41-HG000739] and the UK Medical Research Council [G1000968]. Team 238 is funded by NSF/ABI-0845523 (H.L. and D.Z.), NIH R01LM009959A1 (H.L. and D.Z.). The SIBtex (Swiss Institute of Bioinformatics) team has been partially supported by the SNF (neXtpresso #153437) and the European Union (Khresmoi #257528). Conflict of interest. None declared.
Funding AgencyGrant Number
National Human Genome Research InstituteUNSPECIFIED
National Library of MedicineUNSPECIFIED
Medical Research Council (UK)G1000968
Swiss National Fund (SNF)neXtpresso #153437
European UnionKhresmoi #257528
PubMed Central ID:PMC4142793
Record Number:CaltechAUTHORS:20140902-131532950
Persistent URL:
Official Citation:Yuqing Mao, Kimberly Van Auken, Donghui Li, Cecilia N. Arighi, Peter McQuilton, G. Thomas Hayman, Susan Tweedie, Mary L. Schaeffer, Stanley J. F. Laulederkind, Shur-Jen Wang, Julien Gobeill, Patrick Ruch, Anh Tuan Luu, Jung-jae Kim, Jung-Hsien Chiang, Yu-De Chen, Chia-Jung Yang, Hongfang Liu, Dongqing Zhu, Yanpeng Li, Hong Yu, Ehsan Emadzadeh, Graciela Gonzalez, Jian-Ming Chen, Hong-Jie Dai, and Zhiyong Lu Overview of the gene ontology task at BioCreative IV Database 2014: bau086 doi:10.1093/database/bau086 published online August 25, 2014.
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:49119
Deposited By: Ruth Sustaita
Deposited On:03 Sep 2014 18:53
Last Modified:21 Jul 2017 22:48

Repository Staff Only: item control page