CaltechAUTHORS
  A Caltech Library Service

Multi-Object Classification and Unsupervised Scene Understanding Using Deep Learning Features and Latent Tree Probabilistic Models

Nimmagadda, Tejaswi and Anandkumar, Anima (2015) Multi-Object Classification and Unsupervised Scene Understanding Using Deep Learning Features and Latent Tree Probabilistic Models. . (Unpublished) http://resolver.caltech.edu/CaltechAUTHORS:20190401-162932108

[img] PDF - Submitted Version
See Usage Policy.

598Kb

Use this Persistent URL to link to this item: http://resolver.caltech.edu/CaltechAUTHORS:20190401-162932108

Abstract

Deep learning has shown state-of-art classification performance on datasets such as ImageNet, which contain a single object in each image. However, multi-object classification is far more challenging. We present a unified framework which leverages the strengths of multiple machine learning methods, viz deep learning, probabilistic models and kernel methods to obtain state-of-art performance on Microsoft COCO, consisting of non-iconic images. We incorporate contextual information in natural images through a conditional latent tree probabilistic model (CLTM), where the object co-occurrences are conditioned on the extracted fc7 features from pre-trained Imagenet CNN as input. We learn the CLTM tree structure using conditional pairwise probabilities for object co-occurrences, estimated through kernel methods, and we learn its node and edge potentials by training a new 3-layer neural network, which takes fc7 features as input. Object classification is carried out via inference on the learnt conditional tree model, and we obtain significant gain in precision-recall and F-measures on MS-COCO, especially for difficult object categories. Moreover, the latent variables in the CLTM capture scene information: the images with top activations for a latent node have common themes such as being a grasslands or a food scene, and on on. In addition, we show that a simple k-means clustering of the inferred latent nodes alone significantly improves scene classification performance on the MIT-Indoor dataset, without the need for any retraining, and without using scene labels during training. Thus, we present a unified framework for multi-object classification and unsupervised scene understanding.


Item Type:Report or Paper (Discussion Paper)
Related URLs:
URLURL TypeDescription
http://arxiv.org/abs/1505.00308arXivDiscussion Paper
Record Number:CaltechAUTHORS:20190401-162932108
Persistent URL:http://resolver.caltech.edu/CaltechAUTHORS:20190401-162932108
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:94348
Collection:CaltechAUTHORS
Deposited By: George Porter
Deposited On:02 Apr 2019 22:55
Last Modified:02 Apr 2019 22:55

Repository Staff Only: item control page