Convoluational Transformer With Adaptive Position Embedding For Covid-19 Detection From Cough Sounds

Tianhao Yan; Hao Meng; Shuo Liu; Emilia Parada-Cabaleiro; Zhao Ren; Björn W. Schuller

doi:10.1109/icassp43922.2022.9747513

Details

Originalsprache	Englisch
Titel des Sammelwerks	2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings
Herausgeber (Verlag)	Institute of Electrical and Electronics Engineers Inc.
Seiten	9092-9096
Seitenumfang	5
ISBN (elektronisch)	9781665405409
ISBN (Print)	978-1-6654-0541-6
Publikationsstatus	Veröffentlicht - 2022
Veranstaltung	47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Virtual, Online, Singapur Dauer: 23 Mai 2022 → 27 Mai 2022

Publikationsreihe

Name	ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Band	2022-May
ISSN (Print)	1520-6149

Abstract

Covid-19 has caused a huge health crisis worldwide in the past two years. Although an early detection of the virus through nucleic acid screening can considerably reduce its spread, the efficiency of this diagnostic process is limited by its complexity and costs. Hence, an effective and inexpensive way to early detect Covid-19 is still needed. Considering that the cough of an infected person contains a large amount of information, we propose an algorithm for the automatic recognition of Covid-19 from cough signals. Our approach generates static log-Mel spectrograms with deltas and delta-deltas from the cough signal and subsequently extracts feature maps through a Convolutional Neural Network (CNN). Following the advances on transformers in the realm of deep learning, our proposed architecture exploits a novel adaptive position embedding structure which can learn the position information of the features from the CNN output. This make the transformer structure rapidly lock the attention feature location by overlaying with the CNN output, which yields better classification. The efficiency of the proposed architecture is shown by the improvement, w. r. t. the baseline, of our experimental results on the INTERPSEECH 2021 Computational Paralinguistics Challenge CCS (Coughing Sub Challenge) database, which reached 72.6 % UAR (Unweighted Average Recall).

ASJC Scopus Sachgebiete

Informatik (insg.)
Software
Informatik (insg.)
Signalverarbeitung
Ingenieurwesen (insg.)
Elektrotechnik und Elektronik

Zitieren

Convoluational Transformer With Adaptive Position Embedding For Covid-19 Detection From Cough Sounds. / Yan, Tianhao; Meng, Hao; Liu, Shuo et al.
2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2022. S. 9092-9096 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Band 2022-May).

Publikation: Beitrag in Buch/Bericht/Sammelwerk/Konferenzband › Aufsatz in Konferenzband › Forschung › Peer-Review

Yan, T, Meng, H, Liu, S, Parada-Cabaleiro, E, Ren, Z & Schuller, BW 2022, Convoluational Transformer With Adaptive Position Embedding For Covid-19 Detection From Cough Sounds. in 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, Bd. 2022-May, Institute of Electrical and Electronics Engineers Inc., S. 9092-9096, 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022, Virtual, Online, Singapur, 23 Mai 2022. https://doi.org/10.1109/icassp43922.2022.9747513

Yan, T., Meng, H., Liu, S., Parada-Cabaleiro, E., Ren, Z., & Schuller, B. W. (2022). Convoluational Transformer With Adaptive Position Embedding For Covid-19 Detection From Cough Sounds. In 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings (S. 9092-9096). (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings; Band 2022-May). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/icassp43922.2022.9747513

Yan T, Meng H, Liu S, Parada-Cabaleiro E, Ren Z, Schuller BW. Convoluational Transformer With Adaptive Position Embedding For Covid-19 Detection From Cough Sounds. in 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. Institute of Electrical and Electronics Engineers Inc. 2022. S. 9092-9096. (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings). doi: 10.1109/icassp43922.2022.9747513

Yan, Tianhao ; Meng, Hao ; Liu, Shuo et al. / Convoluational Transformer With Adaptive Position Embedding For Covid-19 Detection From Cough Sounds. 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings. Institute of Electrical and Electronics Engineers Inc., 2022. S. 9092-9096 (ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings).

Download

@inproceedings{b01c79704474432b90e54cca438dd5a2,

title = "Convoluational Transformer With Adaptive Position Embedding For Covid-19 Detection From Cough Sounds",

abstract = "Covid-19 has caused a huge health crisis worldwide in the past two years. Although an early detection of the virus through nucleic acid screening can considerably reduce its spread, the efficiency of this diagnostic process is limited by its complexity and costs. Hence, an effective and inexpensive way to early detect Covid-19 is still needed. Considering that the cough of an infected person contains a large amount of information, we propose an algorithm for the automatic recognition of Covid-19 from cough signals. Our approach generates static log-Mel spectrograms with deltas and delta-deltas from the cough signal and subsequently extracts feature maps through a Convolutional Neural Network (CNN). Following the advances on transformers in the realm of deep learning, our proposed architecture exploits a novel adaptive position embedding structure which can learn the position information of the features from the CNN output. This make the transformer structure rapidly lock the attention feature location by overlaying with the CNN output, which yields better classification. The efficiency of the proposed architecture is shown by the improvement, w. r. t. the baseline, of our experimental results on the INTERPSEECH 2021 Computational Paralinguistics Challenge CCS (Coughing Sub Challenge) database, which reached 72.6 % UAR (Unweighted Average Recall).",

keywords = "Adaptive Position Embedding Transformer, Computer Audition, Convolutional Neural Network, log-Mel Spectrogram, SARS-CoV2 Detection",

author = "Tianhao Yan and Hao Meng and Shuo Liu and Emilia Parada-Cabaleiro and Zhao Ren and Schuller, {Bj{\"o}rn W.}",

year = "2022",

doi = "10.1109/icassp43922.2022.9747513",

language = "English",

isbn = "978-1-6654-0541-6",

series = "ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

pages = "9092--9096",

booktitle = "2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings",

address = "United States",

note = "47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 ; Conference date: 23-05-2022 Through 27-05-2022",

}

Download

TY - GEN

T1 - Convoluational Transformer With Adaptive Position Embedding For Covid-19 Detection From Cough Sounds

AU - Yan, Tianhao

AU - Meng, Hao

AU - Liu, Shuo

AU - Parada-Cabaleiro, Emilia

AU - Ren, Zhao

AU - Schuller, Björn W.

PY - 2022

Y1 - 2022

N2 - Covid-19 has caused a huge health crisis worldwide in the past two years. Although an early detection of the virus through nucleic acid screening can considerably reduce its spread, the efficiency of this diagnostic process is limited by its complexity and costs. Hence, an effective and inexpensive way to early detect Covid-19 is still needed. Considering that the cough of an infected person contains a large amount of information, we propose an algorithm for the automatic recognition of Covid-19 from cough signals. Our approach generates static log-Mel spectrograms with deltas and delta-deltas from the cough signal and subsequently extracts feature maps through a Convolutional Neural Network (CNN). Following the advances on transformers in the realm of deep learning, our proposed architecture exploits a novel adaptive position embedding structure which can learn the position information of the features from the CNN output. This make the transformer structure rapidly lock the attention feature location by overlaying with the CNN output, which yields better classification. The efficiency of the proposed architecture is shown by the improvement, w. r. t. the baseline, of our experimental results on the INTERPSEECH 2021 Computational Paralinguistics Challenge CCS (Coughing Sub Challenge) database, which reached 72.6 % UAR (Unweighted Average Recall).

AB - Covid-19 has caused a huge health crisis worldwide in the past two years. Although an early detection of the virus through nucleic acid screening can considerably reduce its spread, the efficiency of this diagnostic process is limited by its complexity and costs. Hence, an effective and inexpensive way to early detect Covid-19 is still needed. Considering that the cough of an infected person contains a large amount of information, we propose an algorithm for the automatic recognition of Covid-19 from cough signals. Our approach generates static log-Mel spectrograms with deltas and delta-deltas from the cough signal and subsequently extracts feature maps through a Convolutional Neural Network (CNN). Following the advances on transformers in the realm of deep learning, our proposed architecture exploits a novel adaptive position embedding structure which can learn the position information of the features from the CNN output. This make the transformer structure rapidly lock the attention feature location by overlaying with the CNN output, which yields better classification. The efficiency of the proposed architecture is shown by the improvement, w. r. t. the baseline, of our experimental results on the INTERPSEECH 2021 Computational Paralinguistics Challenge CCS (Coughing Sub Challenge) database, which reached 72.6 % UAR (Unweighted Average Recall).

KW - Adaptive Position Embedding Transformer

KW - Computer Audition

KW - Convolutional Neural Network

KW - log-Mel Spectrogram

KW - SARS-CoV2 Detection

UR - http://www.scopus.com/inward/record.url?scp=85131243911&partnerID=8YFLogxK

U2 - 10.1109/icassp43922.2022.9747513

DO - 10.1109/icassp43922.2022.9747513

M3 - Conference contribution

AN - SCOPUS:85131243911

SN - 978-1-6654-0541-6

T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings

SP - 9092

EP - 9096

BT - 2022 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 - Proceedings

PB - Institute of Electrical and Electronics Engineers Inc.

T2 - 47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022

Y2 - 23 May 2022 through 27 May 2022

ER -

Research@Leibniz University

Convoluational Transformer With Adaptive Position Embedding For Covid-19 Detection From Cough Sounds

Autorschaft

Organisationseinheiten

Externe Organisationen

Details

Publikationsreihe

Abstract

ASJC Scopus Sachgebiete

Zitieren