Details
Original language | English |
---|---|
Journal | International Journal of Social Research Methodology |
Early online date | 1 Jan 2025 |
Publication status | E-pub ahead of print - 1 Jan 2025 |
Abstract
Advances in information and communication technology, coupled with a smartphone increase in web surveys, provide new avenues for collecting answers from respondents. Specifically, the microphones of smartphones facilitate the collection of voice instead of text answers to open questions. Speech-to-text transcriptions through Automatic Speech Recognition (ASR) systems pose an efficient way to make voice answers accessible to text-as-data methods. However, there is little evidence on the transcription performance of ASR systems when it comes to voice answers. We therefore investigate the performance of two leading ASR systems–Google’s Cloud Speech-to-Text API and OpenAI’s Whisper–using voice answers to two open questions administered in a smartphone survey in Germany. The results indicate that Whisper produces more accurate transcriptions than Google’s API. Both systems produce similar errors, but these errors are more common for the Google API. However, the Google API is faster than both Whisper and human transcribers.
Keywords
- Automatic speech recognition (ASR), built-in microphone, narrative questions, smartphone survey, transcription quality
ASJC Scopus subject areas
- Social Sciences(all)
- General Social Sciences
Cite this
- Standard
- Harvard
- Apa
- Vancouver
- BibTeX
- RIS
In: International Journal of Social Research Methodology, 01.01.2025.
Research output: Contribution to journal › Article › Research › peer review
}
TY - JOUR
T1 - Automatic speech-to-text transcription
T2 - evidence from a smartphone survey with voice answers
AU - Höhne, Jan Karem
AU - Lenzner, Timo
AU - Claassen, Joshua
N1 - Publisher Copyright: © 2024 The Author(s). Published by Informa UK Limited, trading as Taylor & Francis Group.
PY - 2025/1/1
Y1 - 2025/1/1
N2 - Advances in information and communication technology, coupled with a smartphone increase in web surveys, provide new avenues for collecting answers from respondents. Specifically, the microphones of smartphones facilitate the collection of voice instead of text answers to open questions. Speech-to-text transcriptions through Automatic Speech Recognition (ASR) systems pose an efficient way to make voice answers accessible to text-as-data methods. However, there is little evidence on the transcription performance of ASR systems when it comes to voice answers. We therefore investigate the performance of two leading ASR systems–Google’s Cloud Speech-to-Text API and OpenAI’s Whisper–using voice answers to two open questions administered in a smartphone survey in Germany. The results indicate that Whisper produces more accurate transcriptions than Google’s API. Both systems produce similar errors, but these errors are more common for the Google API. However, the Google API is faster than both Whisper and human transcribers.
AB - Advances in information and communication technology, coupled with a smartphone increase in web surveys, provide new avenues for collecting answers from respondents. Specifically, the microphones of smartphones facilitate the collection of voice instead of text answers to open questions. Speech-to-text transcriptions through Automatic Speech Recognition (ASR) systems pose an efficient way to make voice answers accessible to text-as-data methods. However, there is little evidence on the transcription performance of ASR systems when it comes to voice answers. We therefore investigate the performance of two leading ASR systems–Google’s Cloud Speech-to-Text API and OpenAI’s Whisper–using voice answers to two open questions administered in a smartphone survey in Germany. The results indicate that Whisper produces more accurate transcriptions than Google’s API. Both systems produce similar errors, but these errors are more common for the Google API. However, the Google API is faster than both Whisper and human transcribers.
KW - Automatic speech recognition (ASR)
KW - built-in microphone
KW - narrative questions
KW - smartphone survey
KW - transcription quality
UR - http://www.scopus.com/inward/record.url?scp=85214406860&partnerID=8YFLogxK
U2 - 10.1080/13645579.2024.2443633
DO - 10.1080/13645579.2024.2443633
M3 - Article
AN - SCOPUS:85214406860
JO - International Journal of Social Research Methodology
JF - International Journal of Social Research Methodology
SN - 1364-5579
ER -