Su, Jiahao and Byeon, Wonmin and Kossaifi, Jean and Huang, Furong and Kautz, Jan and Anandkumar, Animashree (2020) Convolutional Tensor-Train LSTM for Spatio-temporal Learning. In: Advances in Neural Information Processing Systems 33 pre-proceedings (NeurIPS 2020). Advances in Neural Information Processing Systems . https://resolver.caltech.edu/CaltechAUTHORS:20200402-134911700
![]() |
PDF
- Published Version
See Usage Policy. 2MB |
![]() |
PDF
- Supplemental Material
See Usage Policy. 3MB |
Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20200402-134911700
Abstract
Learning from spatio-temporal data has numerous applications such as human-behavior analysis, object tracking, video compression, and physics simulation. However, existing methods still perform poorly on challenging video tasks such as long-term forecasting. This is because these kinds of challenging tasks require learning long-term spatio-temporal correlations in the video sequence. In this paper, we propose a higher-order convolutional LSTM model that can efficiently learn these correlations, along with a succinct representations of the history. This is accomplished through a novel tensor train module that performs prediction by combining convolutional features across time. To make this feasible in terms of computation and memory requirements, we propose a novel convolutional tensor-train decomposition of the higher-order model. This decomposition reduces the model complexity by jointly approximating a sequence of convolutional kernels as a low-rank tensor-train factorization. As a result, our model outperforms existing approaches, but uses only a fraction of parameters, including the baseline models. Our results achieve state-of-the-art performance in a wide range of applications and datasets, including the multi-steps video prediction on the Moving-MNIST-2 and KTH action datasets as well as early activity recognition on the Something-Something V2 dataset.
Item Type: | Book Section | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Related URLs: |
| ||||||||||||||
Additional Information: | This work was done while the first author was an intern at NVIDIA. Project page: https://sites.google.com/nvidia.com/conv-tt-lstm. This work was done while the author, Jiahao Su, was an intern at NVIDIA. Su was also partially supported by the startup fund from Department of Computer Science of University of Maryland and National Science Foundation IIS-1850220 CRII Award 030742-00001. The author, Furong Huang, was supported by Adobe, Capital One, and JP Morgan faculty fellowships. | ||||||||||||||
Funders: |
| ||||||||||||||
DOI: | 10.48550/arXiv.2002.09131 | ||||||||||||||
Record Number: | CaltechAUTHORS:20200402-134911700 | ||||||||||||||
Persistent URL: | https://resolver.caltech.edu/CaltechAUTHORS:20200402-134911700 | ||||||||||||||
Usage Policy: | No commercial reproduction, distribution, display or performance rights in this work are provided. | ||||||||||||||
ID Code: | 102272 | ||||||||||||||
Collection: | CaltechAUTHORS | ||||||||||||||
Deposited By: | Tony Diaz | ||||||||||||||
Deposited On: | 02 Apr 2020 20:59 | ||||||||||||||
Last Modified: | 02 Jun 2023 01:03 |
Repository Staff Only: item control page