On echo cancellation for dynamic spatial audio in telepresence systems

Marcel Martin Nophut

doi:10.15488/18093

Details

Original language	English
Qualification	Doctor of Engineering
Awarding Institution	Leibniz University Hannover
Supervised by	Jürgen Karl Peissig, Supervisor
Date of Award	6 Jun 2024
Place of Publication	Hannover
Publication status	Published - 29 Oct 2024

Abstract

Modern telepresence systems are increasingly approaching their desired ideal, where physical distances between individuals or locations are bridged, and the involved technical systems are no longer perceptible. In the audio domain of these systems, the aim is to provide a high degree of auditory immersion, including high-quality capture and reproduction of spatial audio scenes, which facilitates a natural experience of audio events and intuitive interaction and communication. In addition to a full-duplex acoustic connection between the connected locations, this requires a sophisticated setup of loudspeakers and microphones and the use of advanced signal processing methods from the field of spatial audio. However, if these telepresence systems fully exploit the potential of modern audio technologies, connected systems such as acoustic echo cancellation (AEC) must adapt to the challenges that arise. In this thesis, two implementations for audio telepresence are conceived that take different approaches to the recording of spatial audio. In the following, both scenarios are analyzed with respect to a connected AEC system and possible system approaches are discussed. While modern AEC systems employ various signal processing techniques, it is often adaptive filter algorithms that are used to suppress the (dominant) linear echo components, which is why this technique is the main focus here. By leveraging different kinds of prior knowledge that can be derived from the respective scenarios, advanced methods based on modern adaptive filtering algorithms for AEC are developed and investigated to address these specific challenges. In particular, the following techniques are explored: Firstly, in a reproduction system with higher-order Ambisonics (HOA) and a recording-side spherical microphone array, the wave-domain adaptive filtering (WDAF) technique is employed to address the challenges of computational complexity and non-uniqueness problem (NUP). The generalized frequency-domain adaptive filter (GFDAF) serves as a starting point here. Secondly, a system is considered where the sound source is captured through a single-channel lavalier or closeup microphone, combined with a position-tracking system for creating a spatial audio rendering. Here, the captured position data is utilized for a velocity-dependent control of the adaptive filter to enhance its tracking behavior. The underlying algorithm is the frequency-domain adaptive Kalman filter (FDKF). The experiments conducted to evaluate the performance are focused on, but are not limited to, practical scenarios and the use of measured and non-synthesized audio data. The presented approaches demonstrate significant advantages over reference methods, where the benefits, but also potential drawbacks and trade-offs are discussed in detail.

Cite this

On echo cancellation for dynamic spatial audio in telepresence systems. / Nophut, Marcel Martin.
Hannover, 2024. 116 p.

Research output: Thesis › Doctoral thesis

Nophut, MM 2024, 'On echo cancellation for dynamic spatial audio in telepresence systems', Doctor of Engineering, Leibniz University Hannover, Hannover. https://doi.org/10.15488/18093

Nophut, M. M. (2024). On echo cancellation for dynamic spatial audio in telepresence systems. [Doctoral thesis, Leibniz University Hannover]. https://doi.org/10.15488/18093

Nophut MM. On echo cancellation for dynamic spatial audio in telepresence systems. Hannover, 2024. 116 p. doi: 10.15488/18093

Nophut, Marcel Martin. / On echo cancellation for dynamic spatial audio in telepresence systems. Hannover, 2024. 116 p.

Download

@phdthesis{11bde0d5b11d4f77bb2f7c9b20a48a1d,

title = "On echo cancellation for dynamic spatial audio in telepresence systems",

abstract = "Modern telepresence systems are increasingly approaching their desired ideal, where physical distances between individuals or locations are bridged, and the involved technical systems are no longer perceptible. In the audio domain of these systems, the aim is to provide a high degree of auditory immersion, including high-quality capture and reproduction of spatial audio scenes, which facilitates a natural experience of audio events and intuitive interaction and communication. In addition to a full-duplex acoustic connection between the connected locations, this requires a sophisticated setup of loudspeakers and microphones and the use of advanced signal processing methods from the field of spatial audio. However, if these telepresence systems fully exploit the potential of modern audio technologies, connected systems such as acoustic echo cancellation (AEC) must adapt to the challenges that arise. In this thesis, two implementations for audio telepresence are conceived that take different approaches to the recording of spatial audio. In the following, both scenarios are analyzed with respect to a connected AEC system and possible system approaches are discussed. While modern AEC systems employ various signal processing techniques, it is often adaptive filter algorithms that are used to suppress the (dominant) linear echo components, which is why this technique is the main focus here. By leveraging different kinds of prior knowledge that can be derived from the respective scenarios, advanced methods based on modern adaptive filtering algorithms for AEC are developed and investigated to address these specific challenges. In particular, the following techniques are explored: Firstly, in a reproduction system with higher-order Ambisonics (HOA) and a recording-side spherical microphone array, the wave-domain adaptive filtering (WDAF) technique is employed to address the challenges of computational complexity and non-uniqueness problem (NUP). The generalized frequency-domain adaptive filter (GFDAF) serves as a starting point here. Secondly, a system is considered where the sound source is captured through a single-channel lavalier or closeup microphone, combined with a position-tracking system for creating a spatial audio rendering. Here, the captured position data is utilized for a velocity-dependent control of the adaptive filter to enhance its tracking behavior. The underlying algorithm is the frequency-domain adaptive Kalman filter (FDKF). The experiments conducted to evaluate the performance are focused on, but are not limited to, practical scenarios and the use of measured and non-synthesized audio data. The presented approaches demonstrate significant advantages over reference methods, where the benefits, but also potential drawbacks and trade-offs are discussed in detail.",

author = "Nophut, {Marcel Martin}",

year = "2024",

month = oct,

day = "29",

doi = "10.15488/18093",

language = "English",

school = "Leibniz University Hannover",

}

Download

TY - BOOK

T1 - On echo cancellation for dynamic spatial audio in telepresence systems

AU - Nophut, Marcel Martin

PY - 2024/10/29

Y1 - 2024/10/29

N2 - Modern telepresence systems are increasingly approaching their desired ideal, where physical distances between individuals or locations are bridged, and the involved technical systems are no longer perceptible. In the audio domain of these systems, the aim is to provide a high degree of auditory immersion, including high-quality capture and reproduction of spatial audio scenes, which facilitates a natural experience of audio events and intuitive interaction and communication. In addition to a full-duplex acoustic connection between the connected locations, this requires a sophisticated setup of loudspeakers and microphones and the use of advanced signal processing methods from the field of spatial audio. However, if these telepresence systems fully exploit the potential of modern audio technologies, connected systems such as acoustic echo cancellation (AEC) must adapt to the challenges that arise. In this thesis, two implementations for audio telepresence are conceived that take different approaches to the recording of spatial audio. In the following, both scenarios are analyzed with respect to a connected AEC system and possible system approaches are discussed. While modern AEC systems employ various signal processing techniques, it is often adaptive filter algorithms that are used to suppress the (dominant) linear echo components, which is why this technique is the main focus here. By leveraging different kinds of prior knowledge that can be derived from the respective scenarios, advanced methods based on modern adaptive filtering algorithms for AEC are developed and investigated to address these specific challenges. In particular, the following techniques are explored: Firstly, in a reproduction system with higher-order Ambisonics (HOA) and a recording-side spherical microphone array, the wave-domain adaptive filtering (WDAF) technique is employed to address the challenges of computational complexity and non-uniqueness problem (NUP). The generalized frequency-domain adaptive filter (GFDAF) serves as a starting point here. Secondly, a system is considered where the sound source is captured through a single-channel lavalier or closeup microphone, combined with a position-tracking system for creating a spatial audio rendering. Here, the captured position data is utilized for a velocity-dependent control of the adaptive filter to enhance its tracking behavior. The underlying algorithm is the frequency-domain adaptive Kalman filter (FDKF). The experiments conducted to evaluate the performance are focused on, but are not limited to, practical scenarios and the use of measured and non-synthesized audio data. The presented approaches demonstrate significant advantages over reference methods, where the benefits, but also potential drawbacks and trade-offs are discussed in detail.

AB - Modern telepresence systems are increasingly approaching their desired ideal, where physical distances between individuals or locations are bridged, and the involved technical systems are no longer perceptible. In the audio domain of these systems, the aim is to provide a high degree of auditory immersion, including high-quality capture and reproduction of spatial audio scenes, which facilitates a natural experience of audio events and intuitive interaction and communication. In addition to a full-duplex acoustic connection between the connected locations, this requires a sophisticated setup of loudspeakers and microphones and the use of advanced signal processing methods from the field of spatial audio. However, if these telepresence systems fully exploit the potential of modern audio technologies, connected systems such as acoustic echo cancellation (AEC) must adapt to the challenges that arise. In this thesis, two implementations for audio telepresence are conceived that take different approaches to the recording of spatial audio. In the following, both scenarios are analyzed with respect to a connected AEC system and possible system approaches are discussed. While modern AEC systems employ various signal processing techniques, it is often adaptive filter algorithms that are used to suppress the (dominant) linear echo components, which is why this technique is the main focus here. By leveraging different kinds of prior knowledge that can be derived from the respective scenarios, advanced methods based on modern adaptive filtering algorithms for AEC are developed and investigated to address these specific challenges. In particular, the following techniques are explored: Firstly, in a reproduction system with higher-order Ambisonics (HOA) and a recording-side spherical microphone array, the wave-domain adaptive filtering (WDAF) technique is employed to address the challenges of computational complexity and non-uniqueness problem (NUP). The generalized frequency-domain adaptive filter (GFDAF) serves as a starting point here. Secondly, a system is considered where the sound source is captured through a single-channel lavalier or closeup microphone, combined with a position-tracking system for creating a spatial audio rendering. Here, the captured position data is utilized for a velocity-dependent control of the adaptive filter to enhance its tracking behavior. The underlying algorithm is the frequency-domain adaptive Kalman filter (FDKF). The experiments conducted to evaluate the performance are focused on, but are not limited to, practical scenarios and the use of measured and non-synthesized audio data. The presented approaches demonstrate significant advantages over reference methods, where the benefits, but also potential drawbacks and trade-offs are discussed in detail.

U2 - 10.15488/18093

DO - 10.15488/18093

M3 - Doctoral thesis

CY - Hannover

ER -

Research@Leibniz University

On echo cancellation for dynamic spatial audio in telepresence systems

Authors

Research Organisations

Details

Abstract

Cite this