CaltechAUTHORS
  A Caltech Library Service

Building an information retrieval test collection for spontaneous conversational speech

Oard, Douglas W. and Soergel, Dagobert and Doermann, David and Huang, Xiaoli and Murray, G. Craig and Wang, Jianqiang and Ramabhadran, Bhuvana and Franz, Martin and Gustman, Samuel and Mayfield, James and Kharevych, Liliya and Strassel, Stephanie (2004) Building an information retrieval test collection for spontaneous conversational speech. In: SIGIR '04 Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval. ACM , New York, NY, pp. 41-48. ISBN 1-58113-881-4. https://resolver.caltech.edu/CaltechAUTHORS:20161207-162256089

Full text is not posted in this repository. Consult Related URLs below.

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20161207-162256089

Abstract

Test collections model use cases in ways that facilitate evaluation of information retrieval systems. This paper describes the use of search-guided relevance assessment to create a test collection for retrieval of spontaneous conversational speech. Approximately 10,000 thematically coherent segments were manually identified in 625 hours of oral history interviews with 246 individuals. Automatic speech recognition results, manually prepared summaries, controlled vocabulary indexing, and name authority control are available for every segment. Those features were leveraged by a team of four relevance assessors to identify topically relevant segments for 28 topics developed from actual user requests. Search-guided assessment yielded sufficient inter-annotator agreement to support formative evaluation during system development. Baseline results for ranked retrieval are presented to illustrate use of the collection.


Item Type:Book Section
Related URLs:
URLURL TypeDescription
https://doi.org/10.1145/1008992.1009002DOIArticle
http://dl.acm.org/citation.cfm?doid=1008992.1009002PublisherArticle
Additional Information:© 2004 ACM. Thanks to Anton Leuski for help building queries and Meghan Glenn for comments. This work has been supported in part by NSF IIS Award 0122466 and NSF CISE RI Award EIA0130422. Any opinions, findings and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF.
Funders:
Funding AgencyGrant Number
NSF0122466
NSFEIA0130422
Subject Keywords:Experimentation, Measurement, Automatic Speech Recognition, Search-Guided Relevance Assessment, Oral History
Classification Code:H.3.3 [ Information Storage and Retrieval ]: Information Search and Retrieval
DOI:10.1145/1008992.1009002
Record Number:CaltechAUTHORS:20161207-162256089
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20161207-162256089
Official Citation:Douglas W. Oard, Dagobert Soergel, David Doermann, Xiaoli Huang, G. Craig Murray, Jianqiang Wang, Bhuvana Ramabhadran, Martin Franz, Samuel Gustman, James Mayfield, Liliya Kharevych, and Stephanie Strassel. 2004. Building an information retrieval test collection for spontaneous conversational speech. In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval (SIGIR '04). ACM, New York, NY, USA, 41-48. DOI=http://dx.doi.org/10.1145/1008992.1009002
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:72643
Collection:CaltechAUTHORS
Deposited By: Kristin Buxton
Deposited On:08 Dec 2016 00:43
Last Modified:11 Nov 2021 05:04

Repository Staff Only: item control page