Evaluation of an image-based talking head with realistic facial expression and head motion

Kang Liu; Joern Ostermann

doi:10.1007/s12193-011-0070-8

Details

Original language	English
Pages (from-to)	37-44
Number of pages	8
Journal	Journal on Multimodal User Interfaces
Volume	5
Publication status	Published - 29 Oct 2011

Abstract

In this paper, we present an image-based talking head system that is able to synthesize flexible head motion and realistic facial expression accompanying speech, given arbitrary text input and control tags. The goal of facial animation synthesis is to generate lip synchronized and natural animations. The talking head is evaluated objectively and subjectively. The objective measurement is to measure lip synchronization by matching the closures between the synthesized sequences and the real ones, since human viewers are very sensitive to closures, and get the closures at the right time may be the most important objective criterion for providing the impression that lips and sound are synchronized. In subjective tests, facial expression is evaluated by scoring the real and synthesized videos. Head movement is evaluated by scoring the animation with flexible head motion and the animation with repeated head motion. Experimental results show that the proposed objective measurement of lip closure is one of the most significant criteria for subjective evaluation of animations. The animated facial expressions are indistinguishable from real ones subjectively. Furthermore, talking heads with flexible head motion is more realistic and lifelike than the ones with repeated head motion.

Keywords

Facial expression, Head motion, Objective and subjective evaluation, Talking head

ASJC Scopus subject areas

Computer Science(all)
Signal Processing
Computer Science(all)
Human-Computer Interaction

Cite this

Evaluation of an image-based talking head with realistic facial expression and head motion. / Liu, Kang; Ostermann, Joern.
In: Journal on Multimodal User Interfaces, Vol. 5, 29.10.2011, p. 37-44.

Research output: Contribution to journal › Article › Research › peer review

Liu, K & Ostermann, J 2011, 'Evaluation of an image-based talking head with realistic facial expression and head motion', Journal on Multimodal User Interfaces, vol. 5, pp. 37-44. https://doi.org/10.1007/s12193-011-0070-8

Liu, K., & Ostermann, J. (2011). Evaluation of an image-based talking head with realistic facial expression and head motion. Journal on Multimodal User Interfaces, 5, 37-44. https://doi.org/10.1007/s12193-011-0070-8

Liu K, Ostermann J. Evaluation of an image-based talking head with realistic facial expression and head motion. Journal on Multimodal User Interfaces. 2011 Oct 29;5:37-44. doi: 10.1007/s12193-011-0070-8

Liu, Kang ; Ostermann, Joern. / Evaluation of an image-based talking head with realistic facial expression and head motion. In: Journal on Multimodal User Interfaces. 2011 ; Vol. 5. pp. 37-44.

Download

@article{e3147940fc5b4824b9ecfc62c236d5d4,

title = "Evaluation of an image-based talking head with realistic facial expression and head motion",

abstract = "In this paper, we present an image-based talking head system that is able to synthesize flexible head motion and realistic facial expression accompanying speech, given arbitrary text input and control tags. The goal of facial animation synthesis is to generate lip synchronized and natural animations. The talking head is evaluated objectively and subjectively. The objective measurement is to measure lip synchronization by matching the closures between the synthesized sequences and the real ones, since human viewers are very sensitive to closures, and get the closures at the right time may be the most important objective criterion for providing the impression that lips and sound are synchronized. In subjective tests, facial expression is evaluated by scoring the real and synthesized videos. Head movement is evaluated by scoring the animation with flexible head motion and the animation with repeated head motion. Experimental results show that the proposed objective measurement of lip closure is one of the most significant criteria for subjective evaluation of animations. The animated facial expressions are indistinguishable from real ones subjectively. Furthermore, talking heads with flexible head motion is more realistic and lifelike than the ones with repeated head motion.",

keywords = "Facial expression, Head motion, Objective and subjective evaluation, Talking head",

author = "Kang Liu and Joern Ostermann",

note = "Funding information: This work is funded by German Research Society (DFG Sachebeihilfe OS295/3-1). This work has been partially supported by EC within FP6 under Grant 511568 with the acronym 3DTV.",

year = "2011",

month = oct,

day = "29",

doi = "10.1007/s12193-011-0070-8",

language = "English",

volume = "5",

pages = "37--44",

journal = "Journal on Multimodal User Interfaces",

issn = "1783-7677",

publisher = "Springer Verlag",

}

Download

TY - JOUR

T1 - Evaluation of an image-based talking head with realistic facial expression and head motion

AU - Liu, Kang

AU - Ostermann, Joern

N1 - Funding information: This work is funded by German Research Society (DFG Sachebeihilfe OS295/3-1). This work has been partially supported by EC within FP6 under Grant 511568 with the acronym 3DTV.

PY - 2011/10/29

Y1 - 2011/10/29

N2 - In this paper, we present an image-based talking head system that is able to synthesize flexible head motion and realistic facial expression accompanying speech, given arbitrary text input and control tags. The goal of facial animation synthesis is to generate lip synchronized and natural animations. The talking head is evaluated objectively and subjectively. The objective measurement is to measure lip synchronization by matching the closures between the synthesized sequences and the real ones, since human viewers are very sensitive to closures, and get the closures at the right time may be the most important objective criterion for providing the impression that lips and sound are synchronized. In subjective tests, facial expression is evaluated by scoring the real and synthesized videos. Head movement is evaluated by scoring the animation with flexible head motion and the animation with repeated head motion. Experimental results show that the proposed objective measurement of lip closure is one of the most significant criteria for subjective evaluation of animations. The animated facial expressions are indistinguishable from real ones subjectively. Furthermore, talking heads with flexible head motion is more realistic and lifelike than the ones with repeated head motion.

AB - In this paper, we present an image-based talking head system that is able to synthesize flexible head motion and realistic facial expression accompanying speech, given arbitrary text input and control tags. The goal of facial animation synthesis is to generate lip synchronized and natural animations. The talking head is evaluated objectively and subjectively. The objective measurement is to measure lip synchronization by matching the closures between the synthesized sequences and the real ones, since human viewers are very sensitive to closures, and get the closures at the right time may be the most important objective criterion for providing the impression that lips and sound are synchronized. In subjective tests, facial expression is evaluated by scoring the real and synthesized videos. Head movement is evaluated by scoring the animation with flexible head motion and the animation with repeated head motion. Experimental results show that the proposed objective measurement of lip closure is one of the most significant criteria for subjective evaluation of animations. The animated facial expressions are indistinguishable from real ones subjectively. Furthermore, talking heads with flexible head motion is more realistic and lifelike than the ones with repeated head motion.

KW - Facial expression

KW - Head motion

KW - Objective and subjective evaluation

KW - Talking head

UR - http://www.scopus.com/inward/record.url?scp=84858439723&partnerID=8YFLogxK

U2 - 10.1007/s12193-011-0070-8

DO - 10.1007/s12193-011-0070-8

M3 - Article

AN - SCOPUS:84858439723

VL - 5

SP - 37

EP - 44

JO - Journal on Multimodal User Interfaces

JF - Journal on Multimodal User Interfaces

SN - 1783-7677

ER -

Research@Leibniz University

Evaluation of an image-based talking head with realistic facial expression and head motion

Authors

Research Organisations

Details

Abstract

Keywords

ASJC Scopus subject areas

Cite this

By the same author(s)

Inverse design of robust out-of-plane coupling elements

Wire Break Detection in Hybrid Towers of Wind Turbines: A Novel Application to Monitor Tendons Using Acoustic Emission Analysis

Quantized Inverse Design for Photonic Integrated Circuits

Towards Automatic Bias Analysis in Multimedia Journalism

A flexible framework for large-scale FDTD simulations: open-source inverse design for 3D nanostructures

Inverse design of robust out-of-plane coupling elements

Wire Break Detection in Hybrid Towers of Wind Turbines: A Novel Application to Monitor Tendons Using Acoustic Emission Analysis

Quantized Inverse Design for Photonic Integrated Circuits

Towards Automatic Bias Analysis in Multimedia Journalism

A flexible framework for large-scale FDTD simulations: open-source inverse design for 3D nanostructures

Inverse design of robust out-of-plane coupling elements