Publications
Tagged As
Retrieval and browsing of spoken content
Summary
Summary
Ever-increasing computing power and connectivity bandwidth, together with falling storage costs, are resulting in an overwhelming amount of data of various types being produced, exchanged, and stored. Consequently, information search and retrieval has emerged as a key application area. Text-based search is the most active area, with applications that range...
Measuring the readability of automatic speech-to-text transcripts
Summary
Summary
This paper reports initial results from a novel psycholinguistic study that measures the readability of several types of speech transcripts. We define a four-part figure of merit to measure readability: accuracy of answers to comprehension questions, reaction-time for passage reading, reaction-time for question answering and a subjective rating of passage...
High-performance low-complexity wordspotting using neural networks
Summary
Summary
A high-performance low-complexity neural network wordspotter was developed using radial basis function (RBF) neural networks in a hidden Markov model (HMM) framework. Two new complementary approaches substantially improve performance on the talker independent Switchboard corpus. Figure of Merit (FOM) training adapts wordspotter parameters to directly improve the FOM performance metric...
Speech recognition by machines and humans
Summary
Summary
This paper reviews past work comparing modern speech recognition systems and humans to determine how far recent dramatic advances in technology have progressed towards the goal of human-like performance. Comparisons use six modern speech corpora with vocabularies ranging from 10 to more than 65,000 words and content ranging from read...
Speech recognition by humans and machines under conditions with severe channel variability and noise
Summary
Summary
Despite dramatic recent advances in speech recognition technology, speech recognizers still perform much worse than humans. The difference in performance between humans and machines is most dramatic when variable amounts and types of filtering and noise are present during testing. For example, humans readily understand speech that is low-pass filtered...
Recognition by humans and machines: miles to go before we sleep
Summary
Summary
Bourlard and his colleagues note that much effort over the past few years has focused on creating large-vocabulary speech recognition systems and reducing error rates measured using clean speech materials. This has led to experimental talker-independent systems with vocabularies of 65,000 words capable of transcribing sentences on a limited set...
Military and government applications of human-machine communication by voice
Summary
Summary
This paper describes a range of opportunities for military and government applications of human-machine communication by voice, based on visits and contacts with numerous user organizations in the United States. The applications include some that appear to be feasible by careful integration of current state-of-the-art technology and others that will...
A comparison of signal processing front ends for automatic word recognition
Summary
Summary
This paper compares the word error rate of a speech recognizer using several signal processing front ends based on auditory properties. Front ends were compared with a control mel filter banks (MFB) based cepstral front end in clean speech and with speech degraded by noise and spectral variability, using the...
Wordspotter training using figure-of-merit back propagation
Summary
Summary
A new approach to wordspotter training is presented which directly maximizes the Figure of Merit (FOM) defined as the average detection rate over a specified range of false alarm rates. This systematic approach to discriminant training for wordspotters eliminates the necessity of ad hoc thresholds and tuning. It improves the...
Demonstrations and applications of spoken language technology: highlights and perspectives from the 1993 ARPA Spoken Language Technology and Applications Day
Summary
Summary
The ARPA Spoken Language Technology and Applications Day (SLTA'93) was a special workshop which presented a set of live, state-of-the-art demonstrations of speech recognition and Spoken Language Understanding systems. The purpose of this paper is to provide perspective on current opportunities for applications which they can enable, and reviewing the...