Bhatnagar, Vasudha and Dobariyal, Rashmi and Jain, Priya and Mahabal, Ashish (2012) Data Understanding using Semi-Supervised Clustering. In: 2012 Conference on Intelligent Data Understanding. IEEE , Piscataway, NJ, pp. 118-123. ISBN 978-1-4673-4627-6. https://resolver.caltech.edu/CaltechAUTHORS:20170207-173845918
Full text is not posted in this repository. Consult Related URLs below.
Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20170207-173845918
Abstract
In the era of E-science, most scientific endeavors depend on intense data analysis to understand the underlying physical phenomenon. Predictive modeling is one of the popular machine learning tasks undertaken in such endeavors. Labeled data used for training the predictive model reflects understanding of the domain. In this paper we introduce data understanding as a computational problem and propose a solution for enhancing domain understanding based on semisupervised clustering. The proposed DU-SSC (Data Understanding using SemiSupervised Clustering) algorithm is incremental, parameterless and performs single scan of data. Given labeled (training) data is discretized at user specified resolution and finer (micro) data distributions are identified within classes, along with outliers. The discovery process is based on grouping similar instances in data space, while taking into account the degree of influence each attribute exercises on the class label. Maximal Information Coefficient measure is used during similarity computations for this purpose. The study is supported by experiments and a detailed account of understanding gained is presented for two selected UCI data sets. General observations on nine other UCI datasets are presented, along with experiments that demonstrate use of discovered knowledge for improved classification.
Item Type: | Book Section | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Related URLs: |
| |||||||||
ORCID: |
| |||||||||
Additional Information: | © 2012 IEEE. This work was supported by grant Dean(R)/R&D/11/423 from University of Delhi. We are thankful to Manju Bharadwaj, Ramesh Aggarwal and Somitra Sanadhya for their comments and suggestions on this work. | |||||||||
Funders: |
| |||||||||
DOI: | 10.1109/CIDU.2012.6382192 | |||||||||
Record Number: | CaltechAUTHORS:20170207-173845918 | |||||||||
Persistent URL: | https://resolver.caltech.edu/CaltechAUTHORS:20170207-173845918 | |||||||||
Official Citation: | V. Bhatnagar, R. Dobariyal, P. Jain and A. Mahabal, "Data Understanding using Semi-Supervised Clustering," 2012 Conference on Intelligent Data Understanding, Boulder, CO, 2012, pp. 118-123. doi: 10.1109/CIDU.2012.6382192 | |||||||||
Usage Policy: | No commercial reproduction, distribution, display or performance rights in this work are provided. | |||||||||
ID Code: | 74139 | |||||||||
Collection: | CaltechAUTHORS | |||||||||
Deposited By: | Kristin Buxton | |||||||||
Deposited On: | 08 Feb 2017 16:02 | |||||||||
Last Modified: | 11 Nov 2021 05:25 |
Repository Staff Only: item control page