Welcome to the new version of CaltechAUTHORS. Login is currently restricted to library staff. If you notice any issues, please email coda@library.caltech.edu
Published September 2015 | Published + Submitted + Supplemental Material
Book Section - Chapter Open

Describing Common Human Visual Actions in Images


Which common human actions and interactions are recognizable in monocular still images? Which involve objects and/or other people? How many is a person performing at a time? We address these questions by exploring the actions and interactions that are detectable in the images of the MS COCO dataset. We make two main contributions. First, a list of 140 common 'visual actions', obtained by analyzing the largest online verb lexicon currently available for English (VerbNet) and human sentences used to describe images in MS COCO. Second, a complete set of annotations for those 'visual actions', composed of subject-object and associated verb, which we call COCO-a (a for 'actions'). COCO-a is larger than existing action datasets in terms of number instances of actions, and is unique because it is data-driven, rather than experimenter-biased. Other unique features are that it is exhaustive, and that all subjects and objects are localized. A statistical analysis of the accuracy of our annotations and of each action, interaction and subject-object combination is provided.

Additional Information

© 2015. The copyright of this document resides with its authors. It may be distributed unchanged freely in print or electronic forms.

Attached Files

Submitted - 1506.02203.pdf

Published - BMVC15_DescribingCommonVisualActions_PAPER.pdf

Supplemental Material - BMVC15_DescribingCommonVisualActions_SUPP.pdf

Supplemental Material - sup052.zip


Files (18.5 MB)
Name Size Download all
1.3 MB Preview Download
9.0 MB Preview Download
6.8 MB Preview Download
1.4 MB Preview Download

Additional details

August 20, 2023
August 20, 2023