CaltechAUTHORS
  A Caltech Library Service

Learning View-Disentangled Human Pose Representation by Contrastive Cross-View Mutual Information Maximization

Zhao, Long and Wang, Yuxiao and Zhao, Jiaping and Yuan, Liangzhe and Sun, Jennifer J. and Schroff, Florian and Adam, Hartwig and Peng, Xi and Metaxas, Dimitris and Liu, Ting (2021) Learning View-Disentangled Human Pose Representation by Contrastive Cross-View Mutual Information Maximization. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE , Piscataway, NJ, pp. 12788-12797. ISBN 978-1-6654-4509-2. https://resolver.caltech.edu/CaltechAUTHORS:20220105-801103900

[img] PDF - Accepted Version
See Usage Policy.

4MB

Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20220105-801103900

Abstract

We introduce a novel representation learning method to disentangle pose-dependent as well as view-dependent factors from 2D human poses. The method trains a network using cross-view mutual information maximization (CV-MIM) which maximizes mutual information of the same pose performed from different viewpoints in a contrastive learning manner. We further propose two regularization terms to ensure disentanglement and smoothness of the learned representations. The resulting pose representations can be used for cross-view action recognition.To evaluate the power of the learned representations, in addition to the conventional fully-supervised action recognition settings, we introduce a novel task called single-shot cross-view action recognition. This task trains models with actions from only one single viewpoint while models are evaluated on poses captured from all possible viewpoints. We evaluate the learned representations on standard benchmarks for action recognition, and show that (i) CV-MIM performs competitively compared with the state-of-the-art models in the fully-supervised scenarios; (ii) CV-MIM outperforms other competing methods by a large margin in the single-shot cross-view setting; (iii) and the learned representations can significantly boost the performance when reducing the amount of supervised training data. Our code is made publicly available at https://github.com/google-research/google-research/tree/master/poem.


Item Type:Book Section
Related URLs:
URLURL TypeDescription
https://doi.org/10.1109/cvpr46437.2021.01260DOIArticle
https://arxiv.org/abs/2012.01405arXivDiscussion Paper
https://github.com/google-research/google-research/tree/master/poemRelated ItemCode
ORCID:
AuthorORCID
Yuan, Liangzhe0000-0001-9206-1908
Sun, Jennifer J.0000-0002-0906-6589
Schroff, Florian0000-0003-0570-8967
Additional Information:© 2021 IEEE. This work was done while the author was a research intern at Google.
Funders:
Funding AgencyGrant Number
GoogleUNSPECIFIED
DOI:10.1109/cvpr46437.2021.01260
Record Number:CaltechAUTHORS:20220105-801103900
Persistent URL:https://resolver.caltech.edu/CaltechAUTHORS:20220105-801103900
Official Citation:L. Zhao et al., "Learning View-Disentangled Human Pose Representation by Contrastive Cross-View Mutual Information Maximization," 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2021, pp. 12788-12797, doi: 10.1109/CVPR46437.2021.01260
Usage Policy:No commercial reproduction, distribution, display or performance rights in this work are provided.
ID Code:112724
Collection:CaltechAUTHORS
Deposited By: Tony Diaz
Deposited On:09 Jan 2022 21:30
Last Modified:25 Jul 2022 23:14

Repository Staff Only: item control page