Visual speech synthesis from 3D mesh sequences driven by combined speech features

Felix Kuhnke; Jörn Ostermann

doi:10.1109/icme.2017.8019546

Details

Originalsprache	Englisch
Titel des Sammelwerks	2017 IEEE International Conference on Multimedia and Expo
Untertitel	ICME 2017
Herausgeber (Verlag)	IEEE Computer Society
Seiten	1075-1080
Seitenumfang	6
ISBN (elektronisch)	9781509060672
Publikationsstatus	Veröffentlicht - 28 Aug. 2017
Veranstaltung	2017 IEEE International Conference on Multimedia and Expo, ICME 2017 - Hong Kong, Hongkong Dauer: 10 Juli 2017 → 14 Juli 2017

Publikationsreihe

Name	Proceedings - IEEE International Conference on Multimedia and Expo
ISSN (Print)	1945-7871
ISSN (elektronisch)	1945-788X

Abstract

Given a pre-registered 3D mesh sequence and accompanying phoneme-labeled audio, our system creates an animatable face model and a mapping procedure to produce realistic speech animations for arbitrary speech input. Mapping of speech features to model parameters is done using random forests for regression. We propose a new speech feature based on phonemic labels and acoustic features. The novel feature produces more expressive facial animation and it robustly handles temporal labeling errors. Furthermore, by employing a sliding window approach to feature extraction, the system is easy to train and allows for low-delay synthesis. We show that our novel combination of speech features improves visual speech synthesis. Our findings are confirmed by a subjective user study.

ASJC Scopus Sachgebiete

Informatik (insg.)
Computernetzwerke und -kommunikation
Informatik (insg.)
Angewandte Informatik

Zitieren

Visual speech synthesis from 3D mesh sequences driven by combined speech features. / Kuhnke, Felix; Ostermann, Jörn.
2017 IEEE International Conference on Multimedia and Expo: ICME 2017. IEEE Computer Society, 2017. S. 1075-1080 8019546 (Proceedings - IEEE International Conference on Multimedia and Expo).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review

Kuhnke, F & Ostermann, J 2017, Visual speech synthesis from 3D mesh sequences driven by combined speech features. in 2017 IEEE International Conference on Multimedia and Expo: ICME 2017., 8019546, Proceedings - IEEE International Conference on Multimedia and Expo, IEEE Computer Society, S. 1075-1080, 2017 IEEE International Conference on Multimedia and Expo, ICME 2017, Hong Kong, Hongkong, 10 Juli 2017. https://doi.org/10.1109/icme.2017.8019546

Kuhnke, F., & Ostermann, J. (2017). Visual speech synthesis from 3D mesh sequences driven by combined speech features. In 2017 IEEE International Conference on Multimedia and Expo: ICME 2017 (S. 1075-1080). Artikel 8019546 (Proceedings - IEEE International Conference on Multimedia and Expo). IEEE Computer Society. https://doi.org/10.1109/icme.2017.8019546

Kuhnke F, Ostermann J. Visual speech synthesis from 3D mesh sequences driven by combined speech features. in 2017 IEEE International Conference on Multimedia and Expo: ICME 2017. IEEE Computer Society. 2017. S. 1075-1080. 8019546. (Proceedings - IEEE International Conference on Multimedia and Expo). doi: 10.1109/icme.2017.8019546

Kuhnke, Felix ; Ostermann, Jörn. / Visual speech synthesis from 3D mesh sequences driven by combined speech features. 2017 IEEE International Conference on Multimedia and Expo: ICME 2017. IEEE Computer Society, 2017. S. 1075-1080 (Proceedings - IEEE International Conference on Multimedia and Expo).

Download

@inproceedings{07e3e28e69eb4517b9945274201e4c6b,

title = "Visual speech synthesis from 3D mesh sequences driven by combined speech features",

abstract = "Given a pre-registered 3D mesh sequence and accompanying phoneme-labeled audio, our system creates an animatable face model and a mapping procedure to produce realistic speech animations for arbitrary speech input. Mapping of speech features to model parameters is done using random forests for regression. We propose a new speech feature based on phonemic labels and acoustic features. The novel feature produces more expressive facial animation and it robustly handles temporal labeling errors. Furthermore, by employing a sliding window approach to feature extraction, the system is easy to train and allows for low-delay synthesis. We show that our novel combination of speech features improves visual speech synthesis. Our findings are confirmed by a subjective user study.",

keywords = "Facial Animation, Lip Synchronization, Speech Features, Visual Speech Synthesis",

author = "Felix Kuhnke and J{\"o}rn Ostermann",

year = "2017",

month = aug,

day = "28",

doi = "10.1109/icme.2017.8019546",

language = "English",

series = "Proceedings - IEEE International Conference on Multimedia and Expo",

publisher = "IEEE Computer Society",

pages = "1075--1080",

booktitle = "2017 IEEE International Conference on Multimedia and Expo",

address = "United States",

note = "2017 IEEE International Conference on Multimedia and Expo, ICME 2017 ; Conference date: 10-07-2017 Through 14-07-2017",

}

Download

TY - GEN

T1 - Visual speech synthesis from 3D mesh sequences driven by combined speech features

AU - Kuhnke, Felix

AU - Ostermann, Jörn

PY - 2017/8/28

Y1 - 2017/8/28

N2 - Given a pre-registered 3D mesh sequence and accompanying phoneme-labeled audio, our system creates an animatable face model and a mapping procedure to produce realistic speech animations for arbitrary speech input. Mapping of speech features to model parameters is done using random forests for regression. We propose a new speech feature based on phonemic labels and acoustic features. The novel feature produces more expressive facial animation and it robustly handles temporal labeling errors. Furthermore, by employing a sliding window approach to feature extraction, the system is easy to train and allows for low-delay synthesis. We show that our novel combination of speech features improves visual speech synthesis. Our findings are confirmed by a subjective user study.

AB - Given a pre-registered 3D mesh sequence and accompanying phoneme-labeled audio, our system creates an animatable face model and a mapping procedure to produce realistic speech animations for arbitrary speech input. Mapping of speech features to model parameters is done using random forests for regression. We propose a new speech feature based on phonemic labels and acoustic features. The novel feature produces more expressive facial animation and it robustly handles temporal labeling errors. Furthermore, by employing a sliding window approach to feature extraction, the system is easy to train and allows for low-delay synthesis. We show that our novel combination of speech features improves visual speech synthesis. Our findings are confirmed by a subjective user study.

KW - Facial Animation

KW - Lip Synchronization

KW - Speech Features

KW - Visual Speech Synthesis

UR - http://www.scopus.com/inward/record.url?scp=85030238866&partnerID=8YFLogxK

U2 - 10.1109/icme.2017.8019546

DO - 10.1109/icme.2017.8019546

M3 - Conference contribution

AN - SCOPUS:85030238866

T3 - Proceedings - IEEE International Conference on Multimedia and Expo

SP - 1075

EP - 1080

BT - 2017 IEEE International Conference on Multimedia and Expo

PB - IEEE Computer Society

T2 - 2017 IEEE International Conference on Multimedia and Expo, ICME 2017

Y2 - 10 July 2017 through 14 July 2017

ER -

Research@Leibniz University

Visual speech synthesis from 3D mesh sequences driven by combined speech features

Autorschaft

Organisationseinheiten

Details

Publikationsreihe

Abstract

ASJC Scopus Sachgebiete

Zitieren

Von denselben Autoren

A flexible framework for large-scale FDTD simulations: open-source inverse design for 3D nanostructures

Inverse design of robust out-of-plane coupling elements

Wire Break Detection in Hybrid Towers of Wind Turbines: A Novel Application to Monitor Tendons Using Acoustic Emission Analysis

Towards Automatic Bias Analysis in Multimedia Journalism

Exploration of Sequence-wise Optimized Parameters for Low Complexity Enhancement Video Coding (LCEVC) on 4K Content

A flexible framework for large-scale FDTD simulations: open-source inverse design for 3D nanostructures

Inverse design of robust out-of-plane coupling elements

Wire Break Detection in Hybrid Towers of Wind Turbines: A Novel Application to Monitor Tendons Using Acoustic Emission Analysis

Towards Automatic Bias Analysis in Multimedia Journalism

Exploration of Sequence-wise Optimized Parameters for Low Complexity Enhancement Video Coding (LCEVC) on 4K Content

A flexible framework for large-scale FDTD simulations: open-source inverse design for 3D nanostructures