Age-invariant training for end-to-end child speech recognition using adversarial multi-task learning

Lars Rumberg; Hanna Ehlert; Ulrike Lüdtke; Jörn Ostermann

doi:10.21437/interspeech.2021-1241

Details

Original language	English
Title of host publication	22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021
Pages	1913-1917
Number of pages	5
ISBN (electronic)	9781713836902
Publication status	Published - 2021
Event	22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 - Brno, Czech Republic Duration: 30 Aug 2021 → 3 Sept 2021

Publication series

Name	Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume	3
ISSN (Print)	2308-457X
ISSN (electronic)	1990-9772

Abstract

Automatic speech recognition for children’s speech is a challenging task mainly due to scarcity of publicly available child speech corpora and wide inter- and intra-speaker variability in terms of acoustic and linguistic characteristics of children’s speech. We propose a framework for age-invariant training of the acoustic model of end-to-end speech recognition systems based on adversarial multi-task learning. We use age information additionally to just differentiating between the child and adult domains and thus force the acoustic model to learn age invariant features. Our results on publicly available data sets show that this leads to better leveraging of existing data during training. We further show that usage of adversarial multitask learning should not necessarily be regarded as a substitute for traditional feature space adaptation methods, but that both should be used together for best performance.

Keywords

Child speech, Domain adaptation, Speech recognition

ASJC Scopus subject areas

Arts and Humanities(all)
Language and Linguistics
Computer Science(all)
Human-Computer Interaction
Computer Science(all)
Signal Processing
Computer Science(all)
Software
Mathematics(all)
Modelling and Simulation

Cite this

Age-invariant training for end-to-end child speech recognition using adversarial multi-task learning. / Rumberg, Lars; Ehlert, Hanna; Lüdtke, Ulrike et al.
22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021. 2021. p. 1913-1917 (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH; Vol. 3).

Research output: Chapter in book/report/conference proceeding › Conference contribution › Research › peer review

Rumberg, L, Ehlert, H, Lüdtke, U & Ostermann, J 2021, Age-invariant training for end-to-end child speech recognition using adversarial multi-task learning. in 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021. Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, vol. 3, pp. 1913-1917, 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021, Brno, Czech Republic, 30 Aug 2021. https://doi.org/10.21437/interspeech.2021-1241, https://doi.org/10.21437/Interspeech.2021-1241

Rumberg, L., Ehlert, H., Lüdtke, U., & Ostermann, J. (2021). Age-invariant training for end-to-end child speech recognition using adversarial multi-task learning. In 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 (pp. 1913-1917). (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH; Vol. 3). https://doi.org/10.21437/interspeech.2021-1241, https://doi.org/10.21437/Interspeech.2021-1241

Rumberg L, Ehlert H, Lüdtke U, Ostermann J. Age-invariant training for end-to-end child speech recognition using adversarial multi-task learning. In 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021. 2021. p. 1913-1917. (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH). doi: 10.21437/interspeech.2021-1241, 10.21437/Interspeech.2021-1241

Rumberg, Lars ; Ehlert, Hanna ; Lüdtke, Ulrike et al. / Age-invariant training for end-to-end child speech recognition using adversarial multi-task learning. 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021. 2021. pp. 1913-1917 (Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH).

Download

@inproceedings{78446b345f734c27999fe407bd76232a,

title = "Age-invariant training for end-to-end child speech recognition using adversarial multi-task learning",

abstract = "Automatic speech recognition for children{\textquoteright}s speech is a challenging task mainly due to scarcity of publicly available child speech corpora and wide inter- and intra-speaker variability in terms of acoustic and linguistic characteristics of children{\textquoteright}s speech. We propose a framework for age-invariant training of the acoustic model of end-to-end speech recognition systems based on adversarial multi-task learning. We use age information additionally to just differentiating between the child and adult domains and thus force the acoustic model to learn age invariant features. Our results on publicly available data sets show that this leads to better leveraging of existing data during training. We further show that usage of adversarial multitask learning should not necessarily be regarded as a substitute for traditional feature space adaptation methods, but that both should be used together for best performance.",

keywords = "Child speech, Domain adaptation, Speech recognition",

author = "Lars Rumberg and Hanna Ehlert and Ulrike L{\"u}dtke and J{\"o}rn Ostermann",

note = "Publisher Copyright: Copyright {\textcopyright} 2021 ISCA.; 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021 ; Conference date: 30-08-2021 Through 03-09-2021",

year = "2021",

doi = "10.21437/interspeech.2021-1241",

language = "English",

series = "Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",

pages = "1913--1917",

booktitle = "22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021",

}

Download

TY - GEN

T1 - Age-invariant training for end-to-end child speech recognition using adversarial multi-task learning

AU - Rumberg, Lars

AU - Ehlert, Hanna

AU - Lüdtke, Ulrike

AU - Ostermann, Jörn

PY - 2021

Y1 - 2021

N2 - Automatic speech recognition for children’s speech is a challenging task mainly due to scarcity of publicly available child speech corpora and wide inter- and intra-speaker variability in terms of acoustic and linguistic characteristics of children’s speech. We propose a framework for age-invariant training of the acoustic model of end-to-end speech recognition systems based on adversarial multi-task learning. We use age information additionally to just differentiating between the child and adult domains and thus force the acoustic model to learn age invariant features. Our results on publicly available data sets show that this leads to better leveraging of existing data during training. We further show that usage of adversarial multitask learning should not necessarily be regarded as a substitute for traditional feature space adaptation methods, but that both should be used together for best performance.

AB - Automatic speech recognition for children’s speech is a challenging task mainly due to scarcity of publicly available child speech corpora and wide inter- and intra-speaker variability in terms of acoustic and linguistic characteristics of children’s speech. We propose a framework for age-invariant training of the acoustic model of end-to-end speech recognition systems based on adversarial multi-task learning. We use age information additionally to just differentiating between the child and adult domains and thus force the acoustic model to learn age invariant features. Our results on publicly available data sets show that this leads to better leveraging of existing data during training. We further show that usage of adversarial multitask learning should not necessarily be regarded as a substitute for traditional feature space adaptation methods, but that both should be used together for best performance.

KW - Child speech

KW - Domain adaptation

KW - Speech recognition

UR - http://www.scopus.com/inward/record.url?scp=85119174913&partnerID=8YFLogxK

U2 - 10.21437/interspeech.2021-1241

DO - 10.21437/interspeech.2021-1241

M3 - Conference contribution

AN - SCOPUS:85119174913

T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH

SP - 1913

EP - 1917

BT - 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021

T2 - 22nd Annual Conference of the International Speech Communication Association, INTERSPEECH 2021

Y2 - 30 August 2021 through 3 September 2021

ER -

Research@Leibniz University

Age-invariant training for end-to-end child speech recognition using adversarial multi-task learning

Authors

Research Organisations

Details

Publication series

Abstract

Keywords

ASJC Scopus subject areas

Cite this

By the same author(s)

Wire Break Detection in Hybrid Towers of Wind Turbines: A Novel Application to Monitor Tendons Using Acoustic Emission Analysis

Quantized Inverse Design for Photonic Integrated Circuits

Pruning-Aware Loss Functions for STOI-Optimized Pruned Recurrent Autoencoders for the Compression of the Stimulation Patterns of Cochlear Implants at Zero Delay

A flexible framework for large-scale FDTD simulations: open-source inverse design for 3D nanostructures

Inverse design of robust out-of-plane coupling elements

Wire Break Detection in Hybrid Towers of Wind Turbines: A Novel Application to Monitor Tendons Using Acoustic Emission Analysis

Quantized Inverse Design for Photonic Integrated Circuits

Pruning-Aware Loss Functions for STOI-Optimized Pruned Recurrent Autoencoders for the Compression of the Stimulation Patterns of Cochlear Implants at Zero Delay

A flexible framework for large-scale FDTD simulations: open-source inverse design for 3D nanostructures

Inverse design of robust out-of-plane coupling elements

Wire Break Detection in Hybrid Towers of Wind Turbines: A Novel Application to Monitor Tendons Using Acoustic Emission Analysis