CaltechAUTHORS
  A Caltech Library Service

An information theoretic approach to rule induction from databases

Smyth, Padhraic and Goodman, Rodney M. (1992) An information theoretic approach to rule induction from databases. IEEE Transactions on Knowledge and Data Engineering, 4 (4). pp. 301-316. ISSN 1041-4347. doi:10.1109/69.149926. https://resolver.caltech.edu/CaltechAUTHORS:20190314-155127061

[img] PDF - Published Version
See Usage Policy.

1MB

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20190314-155127061

Abstract

The knowledge acquisition bottleneck in obtaining rules directly from an expert is well known. Hence, the problem of automated rule acquisition from data is a well-motivated one, particularly for domains where a database of sample data exists. In this paper we introduce a novel algorithm for the induction of rules from examples. The algorithm is novel in the sense that it not only learns rules for a given concept (classification), but it simultaneously learns rules relating multiple concepts. This type of learning, known as generalized rule induction is considerably more general than existing algorithms which tend to be classification oriented. Initially we focus on the problem of determining a quantitative, well-defined rule preference measure. In particular, we propose a quantity called the J-measure as an information theoretic alternative to existing approaches. The J-measure quantifies the information content of a rule or a hypothesis. We will outline the information theoretic origins of this measure and examine its plausibility as a hypothesis preference measure. We then define the ITRULE algorithm which uses the newly proposed measure to learn a set of optimal rules from a set of data samples, and we conclude the paper with an analysis of experimental results on real-world data.


Item Type:Article
Related URLs:
URLURL TypeDescription
https://doi.org/10.1109/69.149926DOIArticle
Additional Information:© 1992 IEEE. Manuscript received October 25, 1989; revised April 16, 1990. This work was supported in part by Pacific Bell, in part by the U.S. Army Research Office under Contract DAAL03-89-K-0126 and by the California Institute of Technology’s program in Advanced Technologies sponsored by Aerojet General, General Motors, and TRW. Part of this work was carried out by the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration. The authors gratefully acknowledge the assistance of David Aha of the University of California-Irvine in providing the voting data set. and also Brain Gaines of the University of Calgary and Ross Quinlan of the University of Sydney for providing the chess data set.
Funders:
Funding AgencyGrant Number
Pacific BellUNSPECIFIED
Army Research Office (ARO)DAAL03-89-K-0126
Program in Advanced Technologies, CaltechUNSPECIFIED
Aerojet GeneralUNSPECIFIED
General MotorsUNSPECIFIED
TRWUNSPECIFIED
NASA/JPL/CaltechUNSPECIFIED
Subject Keywords:Cross entropy, expert systems, information theory, machine learning, knowledge acquisition, knowledge discovery, rule-based systems, rule induction
Issue or Number:4
DOI:10.1109/69.149926
Record Number:CaltechAUTHORS:20190314-155127061
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20190314-155127061
Official Citation:P. Smyth and R. M. Goodman, "An information theoretic approach to rule induction from databases," in IEEE Transactions on Knowledge and Data Engineering, vol. 4, no. 4, pp. 301-316, Aug. 1992. doi: 10.1109/69.149926
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:93848
Collection:CaltechAUTHORS
Deposited By: George Porter
Deposited On:14 Mar 2019 23:28
Last Modified:16 Nov 2021 17:01

Repository Staff Only: item control page