Loading [MathJax]/extensions/tex2jax.js

Visual speech synthesis from 3D mesh sequences driven by combined speech features

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Autorschaft

Plum Print visual indicator of research metrics
  • Citations
    • Citation Indexes: 4
    • Patent Family Citations: 1
  • Captures
    • Readers: 3
see details

Details

OriginalspracheEnglisch
Titel des Sammelwerks2017 IEEE International Conference on Multimedia and Expo
UntertitelICME 2017
Herausgeber (Verlag)IEEE Computer Society
Seiten1075-1080
Seitenumfang6
ISBN (elektronisch)9781509060672
PublikationsstatusVeröffentlicht - 28 Aug. 2017
Veranstaltung2017 IEEE International Conference on Multimedia and Expo, ICME 2017 - Hong Kong, Hongkong
Dauer: 10 Juli 201714 Juli 2017

Publikationsreihe

NameProceedings - IEEE International Conference on Multimedia and Expo
ISSN (Print)1945-7871
ISSN (elektronisch)1945-788X

Abstract

Given a pre-registered 3D mesh sequence and accompanying phoneme-labeled audio, our system creates an animatable face model and a mapping procedure to produce realistic speech animations for arbitrary speech input. Mapping of speech features to model parameters is done using random forests for regression. We propose a new speech feature based on phonemic labels and acoustic features. The novel feature produces more expressive facial animation and it robustly handles temporal labeling errors. Furthermore, by employing a sliding window approach to feature extraction, the system is easy to train and allows for low-delay synthesis. We show that our novel combination of speech features improves visual speech synthesis. Our findings are confirmed by a subjective user study.

ASJC Scopus Sachgebiete

Zitieren

Visual speech synthesis from 3D mesh sequences driven by combined speech features. / Kuhnke, Felix; Ostermann, Jörn.
2017 IEEE International Conference on Multimedia and Expo: ICME 2017. IEEE Computer Society, 2017. S. 1075-1080 8019546 (Proceedings - IEEE International Conference on Multimedia and Expo).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/KonferenzbandAufsatz in KonferenzbandForschungPeer-Review

Kuhnke, F & Ostermann, J 2017, Visual speech synthesis from 3D mesh sequences driven by combined speech features. in 2017 IEEE International Conference on Multimedia and Expo: ICME 2017., 8019546, Proceedings - IEEE International Conference on Multimedia and Expo, IEEE Computer Society, S. 1075-1080, 2017 IEEE International Conference on Multimedia and Expo, ICME 2017, Hong Kong, Hongkong, 10 Juli 2017. https://doi.org/10.1109/icme.2017.8019546
Kuhnke, F., & Ostermann, J. (2017). Visual speech synthesis from 3D mesh sequences driven by combined speech features. In 2017 IEEE International Conference on Multimedia and Expo: ICME 2017 (S. 1075-1080). Artikel 8019546 (Proceedings - IEEE International Conference on Multimedia and Expo). IEEE Computer Society. https://doi.org/10.1109/icme.2017.8019546
Kuhnke F, Ostermann J. Visual speech synthesis from 3D mesh sequences driven by combined speech features. in 2017 IEEE International Conference on Multimedia and Expo: ICME 2017. IEEE Computer Society. 2017. S. 1075-1080. 8019546. (Proceedings - IEEE International Conference on Multimedia and Expo). doi: 10.1109/icme.2017.8019546
Kuhnke, Felix ; Ostermann, Jörn. / Visual speech synthesis from 3D mesh sequences driven by combined speech features. 2017 IEEE International Conference on Multimedia and Expo: ICME 2017. IEEE Computer Society, 2017. S. 1075-1080 (Proceedings - IEEE International Conference on Multimedia and Expo).
Download
@inproceedings{07e3e28e69eb4517b9945274201e4c6b,
title = "Visual speech synthesis from 3D mesh sequences driven by combined speech features",
abstract = "Given a pre-registered 3D mesh sequence and accompanying phoneme-labeled audio, our system creates an animatable face model and a mapping procedure to produce realistic speech animations for arbitrary speech input. Mapping of speech features to model parameters is done using random forests for regression. We propose a new speech feature based on phonemic labels and acoustic features. The novel feature produces more expressive facial animation and it robustly handles temporal labeling errors. Furthermore, by employing a sliding window approach to feature extraction, the system is easy to train and allows for low-delay synthesis. We show that our novel combination of speech features improves visual speech synthesis. Our findings are confirmed by a subjective user study.",
keywords = "Facial Animation, Lip Synchronization, Speech Features, Visual Speech Synthesis",
author = "Felix Kuhnke and J{\"o}rn Ostermann",
year = "2017",
month = aug,
day = "28",
doi = "10.1109/icme.2017.8019546",
language = "English",
series = "Proceedings - IEEE International Conference on Multimedia and Expo",
publisher = "IEEE Computer Society",
pages = "1075--1080",
booktitle = "2017 IEEE International Conference on Multimedia and Expo",
address = "United States",
note = "2017 IEEE International Conference on Multimedia and Expo, ICME 2017 ; Conference date: 10-07-2017 Through 14-07-2017",

}

Download

TY - GEN

T1 - Visual speech synthesis from 3D mesh sequences driven by combined speech features

AU - Kuhnke, Felix

AU - Ostermann, Jörn

PY - 2017/8/28

Y1 - 2017/8/28

N2 - Given a pre-registered 3D mesh sequence and accompanying phoneme-labeled audio, our system creates an animatable face model and a mapping procedure to produce realistic speech animations for arbitrary speech input. Mapping of speech features to model parameters is done using random forests for regression. We propose a new speech feature based on phonemic labels and acoustic features. The novel feature produces more expressive facial animation and it robustly handles temporal labeling errors. Furthermore, by employing a sliding window approach to feature extraction, the system is easy to train and allows for low-delay synthesis. We show that our novel combination of speech features improves visual speech synthesis. Our findings are confirmed by a subjective user study.

AB - Given a pre-registered 3D mesh sequence and accompanying phoneme-labeled audio, our system creates an animatable face model and a mapping procedure to produce realistic speech animations for arbitrary speech input. Mapping of speech features to model parameters is done using random forests for regression. We propose a new speech feature based on phonemic labels and acoustic features. The novel feature produces more expressive facial animation and it robustly handles temporal labeling errors. Furthermore, by employing a sliding window approach to feature extraction, the system is easy to train and allows for low-delay synthesis. We show that our novel combination of speech features improves visual speech synthesis. Our findings are confirmed by a subjective user study.

KW - Facial Animation

KW - Lip Synchronization

KW - Speech Features

KW - Visual Speech Synthesis

UR - http://www.scopus.com/inward/record.url?scp=85030238866&partnerID=8YFLogxK

U2 - 10.1109/icme.2017.8019546

DO - 10.1109/icme.2017.8019546

M3 - Conference contribution

AN - SCOPUS:85030238866

T3 - Proceedings - IEEE International Conference on Multimedia and Expo

SP - 1075

EP - 1080

BT - 2017 IEEE International Conference on Multimedia and Expo

PB - IEEE Computer Society

T2 - 2017 IEEE International Conference on Multimedia and Expo, ICME 2017

Y2 - 10 July 2017 through 14 July 2017

ER -

Von denselben Autoren