Understanding the Effects of Time-Scale Modification is Crucial for Improving Speech Recognition Performance

Published 31 Dec 2022 •  vol 3, no 2  • 


Authors:

 

Erivelto Luís de Souza, Universidade Federal de São João Del Rei – DTECH, Brazil

Abstract:

 

In this paper, we delve into the intricate relationship between time-scale modification and the performance of automatic speech recognition (ASR) systems. Specifically, we explore how adjusting the temporal dimensions of speech can significantly impact recognition accuracy. We begin by defining the concept of speaking rate normalization, a critical process that involves scaling the duration of audio samples to achieve a consistent speaking pace. This is accomplished by selecting a precise scaling factor for time-scale modification, which helps align disparate speech samples to a standard rate. This normalization is vital because uneven speaking rates can lead to challenges in recognition accuracy. Next, we outline our approach to classifying speech data based on varying speaking rates. By systematically categorizing speech samples into groups that represent different rates of speech, we facilitate a focused analysis of ASR performance. Each category reveals unique characteristics and challenges that the system must address. In our detailed discussion, we provide insights into the performance discrepancies observed across these different speaking rate classifications. We highlight how the ASR system's ability to accurately recognize speech varies depending on the speaking rate, thereby enhancing our understanding of the factors that contribute to success or failure in speech recognition tasks. This exploration aims to identify opportunities for refining ASR technologies and improving their adaptability to diverse speech patterns in real-world applications.

Keywords:

 

Time-Scale Modification, Automatic Speech Recognition (ASR), Speaking Rate Normalization

Citations:

 

APA:
De Souza, E. L. (2023). Understanding the Effects of Time-Scale Modification is Crucial for Improving Speech Recognition Performance. Journal of Science and Engineering Management, 3(2), 47-56. https://doi.org/10.33832/jsem.2022.3.2.05