On echo cancellation for dynamic spatial audio in telepresence systems

Marcel Martin Nophut

doi:10.15488/18093

Details

Originalsprache	Englisch
Qualifikation	Doktor der Ingenieurwissenschaften
Gradverleihende Hochschule	Leibniz Universität Hannover
Betreut von	Jürgen Karl Peissig, Betreuer*in
Datum der Verleihung des Grades	6 Juni 2024
Erscheinungsort	Hannover
Publikationsstatus	Veröffentlicht - 29 Okt. 2024

Abstract

Moderne Telepräsenzsysteme nähern sich zunehmend ihrem angestrebten Ideal an, bei dem die physischen Distanzen zwischen Personen oder Orten überbrückt werden und die beteiligten technischen Systeme nicht mehr wahrnehmbar sind. Im Bereich der dabei verwendeten Audiosysteme ist es das Ziel, ein hohes Maß an auditiver Immersion zu bieten, einschließlich einer qualitativ hochwertigen Erfassung und Wiedergabe von räumlichen Audioszenen, die ein natürliches Erleben von Audioereignissen und eine intuitive Interaktion und Kommunikation ermöglichen. Neben einer akustischen Vollduplex-Verbindung zwischen den verbundenen Orten, ist dafür ein ausgefeilter Aufbau von Lautsprechern und Mikrofonen und die Verwendung fortgeschrittener Signalverarbeitungsmethoden aus dem Bereich des räumlichen Audio erforderlich. Wenn diese Telepräsenzsysteme jedoch das volle Potenzial moderner Audiotechnologien ausschöpfen, müssen sich die angeschlossenen Systeme, wie die Acoustic-Echo-Cancellation (AEC), an die dadurch entstehenden Herausforderungen anpassen. In dieser Arbeit werden zwei Implementierungen für Audio-Telepräsenz entworfen, die unterschiedliche Ansätze der räumlichen Audioaufnahme verfolgen. Im Folgenden werden beide Szenarien im Hinblick auf ein angeschlossenes AEC-System analysiert und mögliche Systemansätze werden diskutiert. Auch wenn in modernen AEC-Systemen mehrere unterschiedliche Signalverarbeitungstechniken zum Einsatz kommen, so sind es dennoch häufig adaptive Filteralgorithmen, die zur Unterdrückung der (dominanten) linearen Echo-Komponenten verwendet werden, weshalb diese Technik hier im Vordergrund steht. Durch das Ausnutzen von unterschiedlichem Vorwissen, das aus den jeweiligen Szenarien abgeleitet werden kann, werden, basierend auf modernen adaptiven Filteralgorithmen für AEC, erweiterte Methoden entwickelt und untersucht, die diese spezifischen Herausforderungen adressieren. Im Speziellen werden die folgenden Techniken behandelt: Erstens wird in einem Wiedergabesystem mit Higher-Order-Ambisonics (HOA) und einem aufnahmeseitigen sphärischen Mikrofon-Array die Technik des Wave-Domain-Adaptive-Filtering (WDAF) verwendet, um den Herausforderungen der Berechnungskomplexität und des Non-Uniqueness-Problem (NUP) zu begegnen. Als Ausgangspunkt dient dabei der Generalized-Frequency-Domain-Adaptive-Filter (GFDAF). Zweitens wird ein System betrachtet, in dem die Schallquellenerfassung über ein einkanaliges Ansteck- oder Nahfeldmikrofon erfolgt, um in Kombination mit einem Positions-Tracking-Systems ein räumliches Audio-Rendering zu machen. Hier werden die erfassten Positionsdaten für eine geschwindigkeitsabhängige Steuerung des adaptiven Filters genutzt, um sein Tracking-Verhalten zu verbessern. Der zugrundeliegende Algorithmus war dabei der Frequency-Domain-Adaptive-Kalman-Filter (FDKF). Die durchgeführten Experimente zur Evaluation der Performance konzentrieren sich dabei, ohne darauf beschränkt zu sein, auf praktische Szenarien und die Verwendung von gemessenen und nicht synthetisierten Audiodaten. Es kann gezeigt werden, dass die vorgestellten Ansätze deutliche Vorteile gegenüber den Referenzmethoden aufweisen. Die Vorteile, aber auch mögliche Nachteile und Abtausche werden im Detail diskutiert.

Zitieren

On echo cancellation for dynamic spatial audio in telepresence systems. / Nophut, Marcel Martin.
Hannover, 2024. 116 S.

Publikation: Qualifikations-/Studienabschlussarbeit › Dissertation

Nophut, MM 2024, 'On echo cancellation for dynamic spatial audio in telepresence systems', Doktor der Ingenieurwissenschaften, Gottfried Wilhelm Leibniz Universität Hannover, Hannover. https://doi.org/10.15488/18093

Nophut, M. M. (2024). On echo cancellation for dynamic spatial audio in telepresence systems. [Dissertation, Gottfried Wilhelm Leibniz Universität Hannover]. https://doi.org/10.15488/18093

Nophut MM. On echo cancellation for dynamic spatial audio in telepresence systems. Hannover, 2024. 116 S. doi: 10.15488/18093

Nophut, Marcel Martin. / On echo cancellation for dynamic spatial audio in telepresence systems. Hannover, 2024. 116 S.

Download

@phdthesis{11bde0d5b11d4f77bb2f7c9b20a48a1d,

title = "On echo cancellation for dynamic spatial audio in telepresence systems",

abstract = "Modern telepresence systems are increasingly approaching their desired ideal, where physical distances between individuals or locations are bridged, and the involved technical systems are no longer perceptible. In the audio domain of these systems, the aim is to provide a high degree of auditory immersion, including high-quality capture and reproduction of spatial audio scenes, which facilitates a natural experience of audio events and intuitive interaction and communication. In addition to a full-duplex acoustic connection between the connected locations, this requires a sophisticated setup of loudspeakers and microphones and the use of advanced signal processing methods from the field of spatial audio. However, if these telepresence systems fully exploit the potential of modern audio technologies, connected systems such as acoustic echo cancellation (AEC) must adapt to the challenges that arise. In this thesis, two implementations for audio telepresence are conceived that take different approaches to the recording of spatial audio. In the following, both scenarios are analyzed with respect to a connected AEC system and possible system approaches are discussed. While modern AEC systems employ various signal processing techniques, it is often adaptive filter algorithms that are used to suppress the (dominant) linear echo components, which is why this technique is the main focus here. By leveraging different kinds of prior knowledge that can be derived from the respective scenarios, advanced methods based on modern adaptive filtering algorithms for AEC are developed and investigated to address these specific challenges. In particular, the following techniques are explored: Firstly, in a reproduction system with higher-order Ambisonics (HOA) and a recording-side spherical microphone array, the wave-domain adaptive filtering (WDAF) technique is employed to address the challenges of computational complexity and non-uniqueness problem (NUP). The generalized frequency-domain adaptive filter (GFDAF) serves as a starting point here. Secondly, a system is considered where the sound source is captured through a single-channel lavalier or closeup microphone, combined with a position-tracking system for creating a spatial audio rendering. Here, the captured position data is utilized for a velocity-dependent control of the adaptive filter to enhance its tracking behavior. The underlying algorithm is the frequency-domain adaptive Kalman filter (FDKF). The experiments conducted to evaluate the performance are focused on, but are not limited to, practical scenarios and the use of measured and non-synthesized audio data. The presented approaches demonstrate significant advantages over reference methods, where the benefits, but also potential drawbacks and trade-offs are discussed in detail.",

author = "Nophut, {Marcel Martin}",

year = "2024",

month = oct,

day = "29",

doi = "10.15488/18093",

language = "English",

school = "Leibniz University Hannover",

}

Download

TY - BOOK

T1 - On echo cancellation for dynamic spatial audio in telepresence systems

AU - Nophut, Marcel Martin

PY - 2024/10/29

Y1 - 2024/10/29

N2 - Modern telepresence systems are increasingly approaching their desired ideal, where physical distances between individuals or locations are bridged, and the involved technical systems are no longer perceptible. In the audio domain of these systems, the aim is to provide a high degree of auditory immersion, including high-quality capture and reproduction of spatial audio scenes, which facilitates a natural experience of audio events and intuitive interaction and communication. In addition to a full-duplex acoustic connection between the connected locations, this requires a sophisticated setup of loudspeakers and microphones and the use of advanced signal processing methods from the field of spatial audio. However, if these telepresence systems fully exploit the potential of modern audio technologies, connected systems such as acoustic echo cancellation (AEC) must adapt to the challenges that arise. In this thesis, two implementations for audio telepresence are conceived that take different approaches to the recording of spatial audio. In the following, both scenarios are analyzed with respect to a connected AEC system and possible system approaches are discussed. While modern AEC systems employ various signal processing techniques, it is often adaptive filter algorithms that are used to suppress the (dominant) linear echo components, which is why this technique is the main focus here. By leveraging different kinds of prior knowledge that can be derived from the respective scenarios, advanced methods based on modern adaptive filtering algorithms for AEC are developed and investigated to address these specific challenges. In particular, the following techniques are explored: Firstly, in a reproduction system with higher-order Ambisonics (HOA) and a recording-side spherical microphone array, the wave-domain adaptive filtering (WDAF) technique is employed to address the challenges of computational complexity and non-uniqueness problem (NUP). The generalized frequency-domain adaptive filter (GFDAF) serves as a starting point here. Secondly, a system is considered where the sound source is captured through a single-channel lavalier or closeup microphone, combined with a position-tracking system for creating a spatial audio rendering. Here, the captured position data is utilized for a velocity-dependent control of the adaptive filter to enhance its tracking behavior. The underlying algorithm is the frequency-domain adaptive Kalman filter (FDKF). The experiments conducted to evaluate the performance are focused on, but are not limited to, practical scenarios and the use of measured and non-synthesized audio data. The presented approaches demonstrate significant advantages over reference methods, where the benefits, but also potential drawbacks and trade-offs are discussed in detail.

AB - Modern telepresence systems are increasingly approaching their desired ideal, where physical distances between individuals or locations are bridged, and the involved technical systems are no longer perceptible. In the audio domain of these systems, the aim is to provide a high degree of auditory immersion, including high-quality capture and reproduction of spatial audio scenes, which facilitates a natural experience of audio events and intuitive interaction and communication. In addition to a full-duplex acoustic connection between the connected locations, this requires a sophisticated setup of loudspeakers and microphones and the use of advanced signal processing methods from the field of spatial audio. However, if these telepresence systems fully exploit the potential of modern audio technologies, connected systems such as acoustic echo cancellation (AEC) must adapt to the challenges that arise. In this thesis, two implementations for audio telepresence are conceived that take different approaches to the recording of spatial audio. In the following, both scenarios are analyzed with respect to a connected AEC system and possible system approaches are discussed. While modern AEC systems employ various signal processing techniques, it is often adaptive filter algorithms that are used to suppress the (dominant) linear echo components, which is why this technique is the main focus here. By leveraging different kinds of prior knowledge that can be derived from the respective scenarios, advanced methods based on modern adaptive filtering algorithms for AEC are developed and investigated to address these specific challenges. In particular, the following techniques are explored: Firstly, in a reproduction system with higher-order Ambisonics (HOA) and a recording-side spherical microphone array, the wave-domain adaptive filtering (WDAF) technique is employed to address the challenges of computational complexity and non-uniqueness problem (NUP). The generalized frequency-domain adaptive filter (GFDAF) serves as a starting point here. Secondly, a system is considered where the sound source is captured through a single-channel lavalier or closeup microphone, combined with a position-tracking system for creating a spatial audio rendering. Here, the captured position data is utilized for a velocity-dependent control of the adaptive filter to enhance its tracking behavior. The underlying algorithm is the frequency-domain adaptive Kalman filter (FDKF). The experiments conducted to evaluate the performance are focused on, but are not limited to, practical scenarios and the use of measured and non-synthesized audio data. The presented approaches demonstrate significant advantages over reference methods, where the benefits, but also potential drawbacks and trade-offs are discussed in detail.

U2 - 10.15488/18093

DO - 10.15488/18093

M3 - Doctoral thesis

CY - Hannover

ER -

Research@Leibniz University

On echo cancellation for dynamic spatial audio in telepresence systems

Autorschaft

Organisationseinheiten

Details

Abstract

Zitieren