Details
Original language | English |
---|---|
Pages (from-to) | 2330-2333 |
Number of pages | 4 |
Journal | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
Publication status | Published - 2008 |
Event | INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association - Brisbane, Australia Duration: 22 Sept 2008 → 26 Sept 2008 |
Abstract
This paper presents the optimization of parameters of talking head for web-based applications with a talking head, such as Newsreader and E-commerce, in which the realistic talking head initiates a conversation with users. Our talking head system includes two parts: analysis and synthesis. The audiovisual analysis part creates a face model of a recorded human subject, which is composed of a personalized 3D mask as well as a large database of mouth images and their related information. The synthesis part generates facial animation by concatenating appropriate mouth images from the database. A critical issue of the synthesis is the unit selection which selects these appropriate mouth images from the database such that they match the spoken words of the talking head. In order to achieve a realistic facial animation, the unit selection has to be optimized. Objective criteria are proposed in this paper and the Pareto optimization is used to train the unit selection. Subjective tests are carried out in our web-based evaluation system. Experimental results show that most people cannot distinguish our facial animations from real videos.
Keywords
- Pareto optimization, Talking head, TTS (Text-to-Speech), Unit selection
ASJC Scopus subject areas
- Computer Science(all)
- Human-Computer Interaction
- Computer Science(all)
- Signal Processing
- Computer Science(all)
- Software
- Neuroscience(all)
- Sensory Systems
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, 2008, p. 2330-2333.
Research output: Contribution to journal › Conference article › Research › peer review
}
TY - JOUR
T1 - Realistic facial animation system for interactive services
AU - Liu, Kang
AU - Ostermann, Joern
PY - 2008
Y1 - 2008
N2 - This paper presents the optimization of parameters of talking head for web-based applications with a talking head, such as Newsreader and E-commerce, in which the realistic talking head initiates a conversation with users. Our talking head system includes two parts: analysis and synthesis. The audiovisual analysis part creates a face model of a recorded human subject, which is composed of a personalized 3D mask as well as a large database of mouth images and their related information. The synthesis part generates facial animation by concatenating appropriate mouth images from the database. A critical issue of the synthesis is the unit selection which selects these appropriate mouth images from the database such that they match the spoken words of the talking head. In order to achieve a realistic facial animation, the unit selection has to be optimized. Objective criteria are proposed in this paper and the Pareto optimization is used to train the unit selection. Subjective tests are carried out in our web-based evaluation system. Experimental results show that most people cannot distinguish our facial animations from real videos.
AB - This paper presents the optimization of parameters of talking head for web-based applications with a talking head, such as Newsreader and E-commerce, in which the realistic talking head initiates a conversation with users. Our talking head system includes two parts: analysis and synthesis. The audiovisual analysis part creates a face model of a recorded human subject, which is composed of a personalized 3D mask as well as a large database of mouth images and their related information. The synthesis part generates facial animation by concatenating appropriate mouth images from the database. A critical issue of the synthesis is the unit selection which selects these appropriate mouth images from the database such that they match the spoken words of the talking head. In order to achieve a realistic facial animation, the unit selection has to be optimized. Objective criteria are proposed in this paper and the Pareto optimization is used to train the unit selection. Subjective tests are carried out in our web-based evaluation system. Experimental results show that most people cannot distinguish our facial animations from real videos.
KW - Pareto optimization
KW - Talking head
KW - TTS (Text-to-Speech)
KW - Unit selection
UR - http://www.scopus.com/inward/record.url?scp=84867227937&partnerID=8YFLogxK
U2 - 10.21437/Interspeech.2008-594
DO - 10.21437/Interspeech.2008-594
M3 - Conference article
AN - SCOPUS:84867227937
SP - 2330
EP - 2333
JO - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
JF - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
SN - 2308-457X
T2 - INTERSPEECH 2008 - 9th Annual Conference of the International Speech Communication Association
Y2 - 22 September 2008 through 26 September 2008
ER -