CaltechAUTHORS
  A Caltech Library Service

What is the Value of Data? on Mathematical Methods for Data Quality Estimation

Raviv, Netanel and Jain, Siddharth and Bruck, Jehoshua (2020) What is the Value of Data? on Mathematical Methods for Data Quality Estimation. In: 2020 IEEE International Symposium on Information Theory (ISIT). IEEE , Piscataway, NJ, pp. 2825-2830. ISBN 9781728164328. https://resolver.caltech.edu/CaltechAUTHORS:20200831-142053055

[img] PDF - Submitted Version
See Usage Policy.

509kB

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20200831-142053055

Abstract

Data is one of the most important assets of the information age, and its societal impact is undisputed. Yet, rigorous methods of assessing the quality of data are lacking. In this paper, we propose a formal definition for the quality of a given dataset. We assess a dataset’s quality by a quantity we call the expected diameter, which measures the expected disagreement between two randomly chosen hypotheses that explain it, and has recently found applications in active learning. We focus on Boolean hyperplanes, and utilize a collection of Fourier analytic, algebraic, and probabilistic methods to come up with theoretical guarantees and practical solutions for the computation of the expected diameter. We also study the behaviour of the expected diameter on algebraically structured datasets, conduct experiments that validate this notion of quality, and demonstrate the feasibility of our techniques.


Item Type:Book Section
Related URLs:
URLURL TypeDescription
https://doi.org/10.1109/isit44484.2020.9174311DOIArticle
https://arxiv.org/abs/2001.03464arXivDiscussion Paper
ORCID:
AuthorORCID
Raviv, Netanel0000-0002-1686-1994
Jain, Siddharth0000-0002-9164-6119
Bruck, Jehoshua0000-0001-8474-0812
Additional Information:© 2020 IEEE.
DOI:10.1109/isit44484.2020.9174311
Record Number:CaltechAUTHORS:20200831-142053055
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20200831-142053055
Official Citation:N. Raviv, S. Jain and J. Bruck, "What is the Value of Data? on Mathematical Methods for Data Quality Estimation," 2020 IEEE International Symposium on Information Theory (ISIT), Los Angeles, CA, USA, 2020, pp. 2825-2830, doi: 10.1109/ISIT44484.2020.9174311
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:105178
Collection:CaltechAUTHORS
Deposited By: Tony Diaz
Deposited On:08 Sep 2020 19:09
Last Modified:16 Nov 2021 18:40

Repository Staff Only: item control page