Publications

Refine Results

(Filters Applied) Clear All

Energy separation in signal modulations with application to speech analysis

Published in:
IEEE Trans. Signal Process., Vol. 41, No. 10, October 1993, pp. 3024-3051.

Summary

Oscillatory signals that have both an amplitude-modulation (AM) and a frequency-modulation (FM) structure are encountered in almost all communication systems. We have also used these structures recently for modeling speech resonances, being motivated by previous work on investigating fluid dynamics phenomena during speech production that provide evidence for the existence of modulations in speech signals. In this paper, we use a nonlinear differential operator that can detect modulations in AM-FM signals by estimating the product of their time-varying amplitude and frequency. This operator essentially tracks the energy needed by a source to produce the oscillatory signal. To solve the fundamental problem of estimating both the amplitude envelope and instantaneous frequency of an AM-FM signal we develop a novel approach that uses nonlinear combinations of instantaneous signal outputs from the energy operator to separate its output energy product into its amplitude modulation and frequency modulation components. The theoretical analysis is done first for continuous-time signals. Then several efficient algorithms are developed and compared for estimating the amplitude envelope and instantaneous frequency of discrete-time AM-FM signals. These energy separation algorithms are then applied to search for modulations in speech resonances, which we model using AM-FM signals to account for time-varying amplitude envelopes and instantaneous frequencies. Our experimental results provide evidence that bandpass filtered speech signals around speech formants contain amplitude and frequency modulations within a pitch period. Overall, the energy separation algorithms, due to their very low computational complexity and instantaneously-adapting nature, are very useful in detecting modulation patterns in speech and other time-varying signals.
READ LESS

Summary

Oscillatory signals that have both an amplitude-modulation (AM) and a frequency-modulation (FM) structure are encountered in almost all communication systems. We have also used these structures recently for modeling speech resonances, being motivated by previous work on investigating fluid dynamics phenomena during speech production that provide evidence for the existence...

READ MORE

LNKnet: Neural network, machine-learning, and statistical software for pattern classification

Published in:
Lincoln Laboratory Journal, Vol. 6, No. 2, Summer/Fall 1993, pp. 249-268.

Summary

Pattern-classification and clustering algorithms are key components of modern information processing systems used to perform tasks such as speech and image recognition, printed-character recognition, medical diagnosis, fault detection, process control, and financial decision making. To simplify the task of applying these types of algorithms in new application areas, we have developed LNKnet-a software package that provides access to more than 20 pattern-classification, clustering, and feature-selection algorithms. Included are the most important algorithms from the fields of neural networks, statistics, machine learning, and artificial intelligence. The algorithms can be trained and tested on separate data or tested with automatic cross-validation. LNKnet runs under the UNM operating system and access to the different algorithms is provided through a graphical point-and-click user interface. Graphical outputs include two-dimensional (2-D) scatter and decision-region plots and 1-D plots of data histograms, classifier outputs, and error rates during training. Parameters of trained classifiers are stored in files from which the parameters can be translated into source-code subroutines (written in the C programming language) that can then be embedded in a user application program. Lincoln Laboratory and other research laboratories have used LNKnet successfully for many diverse applications.
READ LESS

Summary

Pattern-classification and clustering algorithms are key components of modern information processing systems used to perform tasks such as speech and image recognition, printed-character recognition, medical diagnosis, fault detection, process control, and financial decision making. To simplify the task of applying these types of algorithms in new application areas, we have...

READ MORE

Automatic language identification using Gaussian mixture and hidden Markov models

Author:
Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Vol. 2, Speech Processing, ICASSP, 27-30 April 1993, pp. 399-402.

Summary

Ergodic, continuous-observation, hidden Markov models (HMMs) were used to perform automatic language classification and detection of speech messages. State observation probability densities were modeled as tied Gaussian mixtures. The algorithm was evaluated on four multilanguage speech databases: a three language subset of the Spoken Language Library, a three language subset of a five language Rome Laboratory database, the 20 language CCITT database, and the ten language OGI telephone speech database. Generally, performance of a single state HMM (i.e. a static Gaussian mixture classifier) was comparable to the multistate HMMs, indicating that the sequential modeling capabilities of HMMs were not exploited.
READ LESS

Summary

Ergodic, continuous-observation, hidden Markov models (HMMs) were used to perform automatic language classification and detection of speech messages. State observation probability densities were modeled as tied Gaussian mixtures. The algorithm was evaluated on four multilanguage speech databases: a three language subset of the Spoken Language Library, a three language subset...

READ MORE

Detection of transient signals using the energy operator

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Vol. 3, ICASSP, 27-30 April 1993, pp. 145-148.

Summary

A function of the Teager-Kaiser energy operator is introduced as a method for detecting transient signals in the presence of amplitude-modulated and frequency-modulated tonal interference. This function has excellent time resolution and is robust in the presence of white noise. The output of the detection function is also independent of the interference-to-transient ratio when that ratio is large. It is demonstrated that the detection function can be applied to interference signals with multiple amplitude-modulated and frequency-modulated tonal components.
READ LESS

Summary

A function of the Teager-Kaiser energy operator is introduced as a method for detecting transient signals in the presence of amplitude-modulated and frequency-modulated tonal interference. This function has excellent time resolution and is robust in the presence of white noise. The output of the detection function is also independent of...

READ MORE

Time-scale modification of complex acoustic signals

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. 1, Plenary, Special, Audio, Underwater Acoustics, VLSI, Neural Networks, 27-30 April 1993, pp. 213-216.

Summary

A new approach is introduced for time-scale modification of short-duration complex acoustic signals to improve their audibility. The technique constrains the modified signal to take on a specified spectral characteristic while imposing a time-scaled version of the original temporal envelope. Both full-band and sub-band representations of the temporal envelope are considered. In the full-band case, the modified signal is obtained by appropriate selection of its Fourier transform phase. In the sub-band case, using locations of maxima in the sub-band temporal envelopes, the phase of each bandpass signal is formed to preserve "events" in the envelope of the composite signal. The approach is applied to synthetic and actual short-duration acoustic signals consisting of closely-spaced and overlapping sequential time components.
READ LESS

Summary

A new approach is introduced for time-scale modification of short-duration complex acoustic signals to improve their audibility. The technique constrains the modified signal to take on a specified spectral characteristic while imposing a time-scaled version of the original temporal envelope. Both full-band and sub-band representations of the temporal envelope are...

READ MORE

Time-scale modification with temporal envelope invariance

Published in:
Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 17-20 October 1993, pp. 127-130.

Summary

A new approach is introduced for time-scale modification of short-duration complex acoustic signals to improve their audibility. The method preserves the time-scaled temporal envelope of a signal and for enhancement capitalizes on the perceptual importance of a signal's temporal structure. The basis for the approach is a sub-band representation whose channel phases are controlled to shape the temporal envelope of the time-scaled signal. The phase control is derived from locations of events which occur within filterbank outputs. A frame-based generalization of the method imposes phase consistency across consecutive synthesis frames. The approach is applied to synthetic and actual short-duration acoustic signals consisting of closely-spaced and overlapping sequential time components.
READ LESS

Summary

A new approach is introduced for time-scale modification of short-duration complex acoustic signals to improve their audibility. The method preserves the time-scaled temporal envelope of a signal and for enhancement capitalizes on the perceptual importance of a signal's temporal structure. The basis for the approach is a sub-band representation whose...

READ MORE

Two-talker pitch tracking for co-channel talker interference suppression

Published in:
MIT Lincoln Laboratory Report TR-951

Summary

Almost all co-channel talker interference suppression systems use the difference in the pitches of the target and jammer speakers to suppress the jammer and enhance the target. While joint pitch estimators outputting two pitch estimates as a function of time have been proposed, the task of proper assignment of pitch to speaker (two-talker pitch tracking) has proven difficult. This report describes several approaches to the two-talker pitch tracking problem including algorithms for pitch track interpolation, spectral envelope tracking, and spectral envelope classification. When evaluated on an all-voiced two-talker database, the best of these new tracking systems correctly assigned pitch 87% of the time given perfect joint pitch estimation.
READ LESS

Summary

Almost all co-channel talker interference suppression systems use the difference in the pitches of the target and jammer speakers to suppress the jammer and enhance the target. While joint pitch estimators outputting two pitch estimates as a function of time have been proposed, the task of proper assignment of pitch...

READ MORE

An integrated speech-background model for robust speaker identification

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. 2, 23-26 March 1992, pp. 185-188.

Summary

This paper examines a procedure for text independent speaker identification in noisy environments where the interfering background signals cannot be characterized using traditional broadband or impulsive noise models. In the procedure, both the speaker and the background processes are modeled using mixtures of Gaussians. Speaker and background models are integrated into a unified statistical framework allowing the decoupling of the underlying speech process from the noise corrupted observations via the expectation-maximization algorithm. Using this formalism, speaker model parameters are estimated in the presence of the background process, and a scoring procedure is implemented for computing the speaker likelihood in the noise corrupted environment. Performance is evaluated using a 16 speaker conversational speech database with both "speech babble" and white noise background processes.
READ LESS

Summary

This paper examines a procedure for text independent speaker identification in noisy environments where the interfering background signals cannot be characterized using traditional broadband or impulsive noise models. In the procedure, both the speaker and the background processes are modeled using mixtures of Gaussians. Speaker and background models are integrated...

READ MORE

A speech recognizer using radial basis function neural networks in an HMM framework

Published in:
ICASSP'92, Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Vol. 1, Speech Processing 1, 23-26 March 1992, pp. 629-632.

Summary

A high performance speaker-independent isolated-word speech recognizer was developed which combines hidden Markov models (HMMs) and radial basis function (RBF) neural networks. RBF networks in this recognizer use discriminant training techniques to estimate Bayesian probabilities for each speech frame while HMM decoders estimate overall word likelihood scores for network outputs. RBF training is performed after the HMM recognizer has automatically segmented training tokens using forced Viterbi alignment. In recognition experiments using a speaker-independent E-set database, the hybrid recognizer had an error rate of 11.5% compared to 15.7% for the robust unimodal Gaussian HMM recognizer upon which the hybrid system was based. The error rate was also lower than that of a tied-mixture HMM recognizer with the same number of centers. These results demonstrate that RBF networks can be successfully incorporated in hybrid recognizers and suggest that they may be capable of good performance with fewer parameters than required by Gaussian mixture classifiers.
READ LESS

Summary

A high performance speaker-independent isolated-word speech recognizer was developed which combines hidden Markov models (HMMs) and radial basis function (RBF) neural networks. RBF networks in this recognizer use discriminant training techniques to estimate Bayesian probabilities for each speech frame while HMM decoders estimate overall word likelihood scores for network outputs...

READ MORE

Shape invariant time-scale and pitch modification of speech

Published in:
IEEE Trans. Signal Process., Vol. 40, No. 3, March 1992, pp. 497-510.

Summary

The simplified linear model of speech production predicts that when the rate of articulation is changed, the resulting waveform takes on the appearance of the original, except for a change in the time scale. The goal of this paper is to develop a time-scale modification system that preserves this shape-invariance property during voicing. This is done using a version of the sinusoidal analysis-synthesis system that models and independently modifies the phase contributions of the vocal tract and vocal cord excitation. An important property of the system is its capability of performing time-varying rates of change. Extensions of the method are applied to fixed and time-varying pitch modification of speech. The sine-wave analysis-synthesis system also allows for shape-invariant joint time-scale and pitch modification, and allows for the adjustment of the time scale and pitch according to speech characteristics such as the degree of voicing.
READ LESS

Summary

The simplified linear model of speech production predicts that when the rate of articulation is changed, the resulting waveform takes on the appearance of the original, except for a change in the time scale. The goal of this paper is to develop a time-scale modification system that preserves this shape-invariance...

READ MORE