Details
Original language | English |
---|---|
Title of host publication | 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 |
Pages | 1913-1917 |
Number of pages | 5 |
ISBN (electronic) | 9781713836902 |
Publication status | Published - 2021 |
Event | 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 - Brno, Czech Republic Duration: 30 Aug 2021 → 3 Sept 2021 |
Publication series
Name | Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH |
---|---|
Volume | 3 |
ISSN (Print) | 2308-457X |
ISSN (electronic) | 1990-9772 |
Abstract
Automatic speech recognition for children’s speech is a challenging task mainly due to scarcity of publicly available child speech corpora and wide inter- and intra-speaker variability in terms of acoustic and linguistic characteristics of children’s speech. We propose a framework for age-invariant training of the acoustic model of end-to-end speech recognition systems based on adversarial multi-task learning. We use age information additionally to just differentiating between the child and adult domains and thus force the acoustic model to learn age invariant features. Our results on publicly available data sets show that this leads to better leveraging of existing data during training. We further show that usage of adversarial multitask learning should not necessarily be regarded as a substitute for traditional feature space adaptation methods, but that both should be used together for best performance.
Keywords
- Child speech, Domain adaptation, Speech recognition
ASJC Scopus subject areas
- Arts and Humanities(all)
- Language and Linguistics
- Computer Science(all)
- Human-Computer Interaction
- Computer Science(all)
- Signal Processing
- Computer Science(all)
- Software
- Mathematics(all)
- Modelling and Simulation
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021. 2021. p. 1913-1917 (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH; Vol. 3).
Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review
}
TY - GEN
T1 - Age-invariant training for end-to-end child speech recognition using adversarial multi-task learning
AU - Rumberg, Lars
AU - Ehlert, Hanna
AU - Lüdtke, Ulrike
AU - Ostermann, Jörn
N1 - Publisher Copyright: Copyright © 2021 ISCA.
PY - 2021
Y1 - 2021
N2 - Automatic speech recognition for children’s speech is a challenging task mainly due to scarcity of publicly available child speech corpora and wide inter- and intra-speaker variability in terms of acoustic and linguistic characteristics of children’s speech. We propose a framework for age-invariant training of the acoustic model of end-to-end speech recognition systems based on adversarial multi-task learning. We use age information additionally to just differentiating between the child and adult domains and thus force the acoustic model to learn age invariant features. Our results on publicly available data sets show that this leads to better leveraging of existing data during training. We further show that usage of adversarial multitask learning should not necessarily be regarded as a substitute for traditional feature space adaptation methods, but that both should be used together for best performance.
AB - Automatic speech recognition for children’s speech is a challenging task mainly due to scarcity of publicly available child speech corpora and wide inter- and intra-speaker variability in terms of acoustic and linguistic characteristics of children’s speech. We propose a framework for age-invariant training of the acoustic model of end-to-end speech recognition systems based on adversarial multi-task learning. We use age information additionally to just differentiating between the child and adult domains and thus force the acoustic model to learn age invariant features. Our results on publicly available data sets show that this leads to better leveraging of existing data during training. We further show that usage of adversarial multitask learning should not necessarily be regarded as a substitute for traditional feature space adaptation methods, but that both should be used together for best performance.
KW - Child speech
KW - Domain adaptation
KW - Speech recognition
UR - http://www.scopus.com/inward/record.url?scp=85119174913&partnerID=8YFLogxK
U2 - 10.21437/interspeech.2021-1241
DO - 10.21437/interspeech.2021-1241
M3 - Conference contribution
AN - SCOPUS:85119174913
T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
SP - 1913
EP - 1917
BT - 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
T2 - 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
Y2 - 30 August 2021 through 3 September 2021
ER -