Age-invariant training for end-to-end child speech recognition using adversarial multi-task learning

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Authors

View graph of relations

Details

Original languageEnglish
Title of host publication22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
Pages1913-1917
Number of pages5
ISBN (electronic)9781713836902
Publication statusPublished - 2021
Event22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 - Brno, Czech Republic
Duration: 30 Aug 20213 Sept 2021

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume3
ISSN (Print)2308-457X
ISSN (electronic)1990-9772

Abstract

Automatic speech recognition for children’s speech is a challenging task mainly due to scarcity of publicly available child speech corpora and wide inter- and intra-speaker variability in terms of acoustic and linguistic characteristics of children’s speech. We propose a framework for age-invariant training of the acoustic model of end-to-end speech recognition systems based on adversarial multi-task learning. We use age information additionally to just differentiating between the child and adult domains and thus force the acoustic model to learn age invariant features. Our results on publicly available data sets show that this leads to better leveraging of existing data during training. We further show that usage of adversarial multitask learning should not necessarily be regarded as a substitute for traditional feature space adaptation methods, but that both should be used together for best performance.

Keywords

    Child speech, Domain adaptation, Speech recognition

ASJC Scopus subject areas

Cite this

Age-invariant training for end-to-end child speech recognition using adversarial multi-task learning. / Rumberg, Lars; Ehlert, Hanna; Lüdtke, Ulrike et al.
22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021. 2021. p. 1913-1917 (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH; Vol. 3).

Research output: Chapter in book/report/conference proceedingConference contributionResearchpeer review

Rumberg, L, Ehlert, H, Lüdtke, U & Ostermann, J 2021, Age-invariant training for end-to-end child speech recognition using adversarial multi-task learning. in 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol. 3, pp. 1913-1917, 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021, Brno, Czech Republic, 30 Aug 2021. https://doi.org/10.21437/interspeech.2021-1241, https://doi.org/10.21437/Interspeech.2021-1241
Rumberg, L., Ehlert, H., Lüdtke, U., & Ostermann, J. (2021). Age-invariant training for end-to-end child speech recognition using adversarial multi-task learning. In 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 (pp. 1913-1917). (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH; Vol. 3). https://doi.org/10.21437/interspeech.2021-1241, https://doi.org/10.21437/Interspeech.2021-1241
Rumberg L, Ehlert H, Lüdtke U, Ostermann J. Age-invariant training for end-to-end child speech recognition using adversarial multi-task learning. In 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021. 2021. p. 1913-1917. (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH). doi: 10.21437/interspeech.2021-1241, 10.21437/Interspeech.2021-1241
Rumberg, Lars ; Ehlert, Hanna ; Lüdtke, Ulrike et al. / Age-invariant training for end-to-end child speech recognition using adversarial multi-task learning. 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021. 2021. pp. 1913-1917 (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH).
Download
@inproceedings{78446b345f734c27999fe407bd76232a,
title = "Age-invariant training for end-to-end child speech recognition using adversarial multi-task learning",
abstract = "Automatic speech recognition for children{\textquoteright}s speech is a challenging task mainly due to scarcity of publicly available child speech corpora and wide inter- and intra-speaker variability in terms of acoustic and linguistic characteristics of children{\textquoteright}s speech. We propose a framework for age-invariant training of the acoustic model of end-to-end speech recognition systems based on adversarial multi-task learning. We use age information additionally to just differentiating between the child and adult domains and thus force the acoustic model to learn age invariant features. Our results on publicly available data sets show that this leads to better leveraging of existing data during training. We further show that usage of adversarial multitask learning should not necessarily be regarded as a substitute for traditional feature space adaptation methods, but that both should be used together for best performance.",
keywords = "Child speech, Domain adaptation, Speech recognition",
author = "Lars Rumberg and Hanna Ehlert and Ulrike L{\"u}dtke and J{\"o}rn Ostermann",
note = "Publisher Copyright: Copyright {\textcopyright} 2021 ISCA.; 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 ; Conference date: 30-08-2021 Through 03-09-2021",
year = "2021",
doi = "10.21437/interspeech.2021-1241",
language = "English",
series = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
pages = "1913--1917",
booktitle = "22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021",

}

Download

TY - GEN

T1 - Age-invariant training for end-to-end child speech recognition using adversarial multi-task learning

AU - Rumberg, Lars

AU - Ehlert, Hanna

AU - Lüdtke, Ulrike

AU - Ostermann, Jörn

N1 - Publisher Copyright: Copyright © 2021 ISCA.

PY - 2021

Y1 - 2021

N2 - Automatic speech recognition for children’s speech is a challenging task mainly due to scarcity of publicly available child speech corpora and wide inter- and intra-speaker variability in terms of acoustic and linguistic characteristics of children’s speech. We propose a framework for age-invariant training of the acoustic model of end-to-end speech recognition systems based on adversarial multi-task learning. We use age information additionally to just differentiating between the child and adult domains and thus force the acoustic model to learn age invariant features. Our results on publicly available data sets show that this leads to better leveraging of existing data during training. We further show that usage of adversarial multitask learning should not necessarily be regarded as a substitute for traditional feature space adaptation methods, but that both should be used together for best performance.

AB - Automatic speech recognition for children’s speech is a challenging task mainly due to scarcity of publicly available child speech corpora and wide inter- and intra-speaker variability in terms of acoustic and linguistic characteristics of children’s speech. We propose a framework for age-invariant training of the acoustic model of end-to-end speech recognition systems based on adversarial multi-task learning. We use age information additionally to just differentiating between the child and adult domains and thus force the acoustic model to learn age invariant features. Our results on publicly available data sets show that this leads to better leveraging of existing data during training. We further show that usage of adversarial multitask learning should not necessarily be regarded as a substitute for traditional feature space adaptation methods, but that both should be used together for best performance.

KW - Child speech

KW - Domain adaptation

KW - Speech recognition

UR - http://www.scopus.com/inward/record.url?scp=85119174913&partnerID=8YFLogxK

U2 - 10.21437/interspeech.2021-1241

DO - 10.21437/interspeech.2021-1241

M3 - Conference contribution

AN - SCOPUS:85119174913

T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SP - 1913

EP - 1917

BT - 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021

T2 - 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021

Y2 - 30 August 2021 through 3 September 2021

ER -

By the same author(s)