A Caltech Library Service

Improving Generalization by Data Categorization

Li, Ling and Pratap, Amrit and Lin, Hsuan-Tien and Abu-Mostafa, Yaser S. (2005) Improving Generalization by Data Categorization. In: Knowledge Discovery in Databases: PKDD 2005. Lecture Notes in Computer Science. No.3721. Springer , Berlin, pp. 157-168. ISBN 978-3-540-29244-9.

Full text is not posted in this repository. Consult Related URLs below.

Use this Persistent URL to link to this item:


In most of the learning algorithms, examples in the training set are treated equally. Some examples, however, carry more reliable or critical information about the target than the others, and some may carry wrong information. According to their intrinsic margin, examples can be grouped into three categories: typical, critical, and noisy. We propose three methods, namely the selection cost, SVM confidence margin, and AdaBoost data weight, to automatically group training examples into these three categories. Experimental results on artificial datasets show that, although the three methods have quite different nature, they give similar and reasonable categorization. Results with real-world datasets further demonstrate that treating the three data categories differently in learning can improve generalization.

Item Type:Book Section
Related URLs:
URLURL TypeDescription
Additional Information:© 2005 Springer-Verlag Berlin Heidelberg. We thank Anelia Angelova, Marcelo Medeiros, Carlos Pedreira, David Soloveichik and the anonymous reviewers for helpful discussions. This work was mainly done in 2003 and was supported by the Caltech Center for Neuromorphic Systems Engineering under the US NSF Cooperative Agreement EEC-9402726.
Funding AgencyGrant Number
Center for Neuromorphic Systems Engineering, CaltechUNSPECIFIED
Subject Keywords:Support Vector Machine; Learning Algorithm; Target Function; Selection Cost; Intrinsic Function
Series Name:Lecture Notes in Computer Science
Issue or Number:3721
Record Number:CaltechAUTHORS:20190702-142717858
Persistent URL:
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:96892
Deposited By: Tony Diaz
Deposited On:08 Jul 2019 17:24
Last Modified:16 Nov 2021 17:24

Repository Staff Only: item control page