Taylor, Sarah and Kim, Taehwan and Yue, Yisong and Mahler, Moshe and Krahe, James and Garcia Rodriguez, Anastasio and Hodgins, Jessica and Matthews, Iain (2017) A deep learning approach for generalized speech animation. ACM Transactions on Graphics, 36 (4). Art. 93. ISSN 0730-0301. doi:10.1145/3072959.3073699. https://resolver.caltech.edu/CaltechAUTHORS:20170814-143407341
![]() |
PDF
- Published Version
See Usage Policy. 3MB |
Use this Persistent URL to link to this item: https://resolver.caltech.edu/CaltechAUTHORS:20170814-143407341
Abstract
We introduce a simple and effective deep learning approach to automatically generate natural looking speech animation that synchronizes to input speech. Our approach uses a sliding window predictor that learns arbitrary nonlinear mappings from phoneme label input sequences to mouth movements in a way that accurately captures natural motion and visual coarticulation effects. Our deep learning approach enjoys several attractive properties: it runs in real-time, requires minimal parameter tuning, generalizes well to novel input speech sequences, is easily edited to create stylized and emotional speech, and is compatible with existing animation retargeting approaches. One important focus of our work is to develop an effective approach for speech animation that can be easily integrated into existing production pipelines. We provide a detailed description of our end-to-end approach, including machine learning design decisions. Generalized speech animation results are demonstrated over a wide range of animation clips on a variety of characters and voices, including singing and foreign language input. Our approach can also generate on-demand speech animation in real-time from user speech input.
Item Type: | Article | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Related URLs: |
| |||||||||
ORCID: |
| |||||||||
Additional Information: | © 2017 Copyright held by the owner/author(s). Publication rights licensed to ACM. We owe great thanks to our always accommodating and professional actor, Ken Bolden. Barry-John Theobald and Ausdang Thangthai contributed their HMM synthesis implementation. Scott Jones at Lucasilm and Hao Li at USC generously provided facial rigs. Thanks to the diverse members of Disney Research Pittsburgh who recorded foreign language speech examples. The work was supported by EPSRC grant EP/M014053/1. | |||||||||
Funders: |
| |||||||||
Subject Keywords: | Computing methodologies→Neural networks; Procedural animation; Motion processing; Real-time simulation; Visual analytics; Speech Animation; Machine Learning | |||||||||
Issue or Number: | 4 | |||||||||
DOI: | 10.1145/3072959.3073699 | |||||||||
Record Number: | CaltechAUTHORS:20170814-143407341 | |||||||||
Persistent URL: | https://resolver.caltech.edu/CaltechAUTHORS:20170814-143407341 | |||||||||
Official Citation: | Sarah Taylor, Taehwan Kim, Yisong Yue, Moshe Mahler, James Krahe, Anastasio Garcia Rodriguez, Jessica Hodgins, and Iain Matthews. 2017. A deep learning approach for generalized speech animation. ACM Trans. Graph. 36, 4, Article 93 (July 2017), 11 pages. DOI: https://doi.org/10.1145/3072959.3073699 | |||||||||
Usage Policy: | No commercial reproduction, distribution, display or performance rights in this work are provided. | |||||||||
ID Code: | 80375 | |||||||||
Collection: | CaltechAUTHORS | |||||||||
Deposited By: | INVALID USER | |||||||||
Deposited On: | 14 Aug 2017 22:16 | |||||||||
Last Modified: | 15 Nov 2021 19:30 |
Repository Staff Only: item control page