Publications

Refine Results

(Filters Applied) Clear All

Implications of glottal source for speaker and dialect identification

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. II, 15-19 March 1999, pp. 813-816.

Summary

In this paper we explore the importance of speaker specific information carried in the glottal source. We time align utterances of two speakers speaking the same sentence from the TIMIT database of American English. We then extract the glottal flow derivative from each speaker and interchange them. Through time alignment and this glottal flow transformation, we can make a speaker of a northern dialect sound more like his southern counterpart. We also time align the utterances of two speakers of Spanish dialects speaking the same sentence and then perform the glottal waveform transformation. Through these processes a Peruvian speaker is made to sound more Cuban-like. From these experiments we conclude that significant speaker and dialect specific information, such as noise, breathiness or aspiration, and vocalization, is carried in the glottal signal.
READ LESS

Summary

In this paper we explore the importance of speaker specific information carried in the glottal source. We time align utterances of two speakers speaking the same sentence from the TIMIT database of American English. We then extract the glottal flow derivative from each speaker and interchange them. Through time alignment...

READ MORE

'Perfect reconstruction' time-scaling filterbanks

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. III, 15-19 March 1999, pp. 945-948.

Summary

A filterbank-based method of time-scale modification is analyzed for elemental signals including clicks, sines, and AM-FM sines. It is shown that with the use of some basic properties of linear systems, as well as FM-to-AM filter transduction, "perfect reconstruction" time-scaling filterbanks can be constructed for these elemental signal classes under certain conditions on the filterbank. Conditions for perfect reconstruction time-scaling are shown analytically for the uniform filterbank case, while empirically for the nonuniform constant-Q (gammatone) case. Extension of perfect reconstruction to multi-component signals is shown to require both filterbank and signal-dependent conditions and indicates the need for a more complete theory of "perfect reconstruction" time-scaling filterbanks.
READ LESS

Summary

A filterbank-based method of time-scale modification is analyzed for elemental signals including clicks, sines, and AM-FM sines. It is shown that with the use of some basic properties of linear systems, as well as FM-to-AM filter transduction, "perfect reconstruction" time-scaling filterbanks can be constructed for these elemental signal classes under...

READ MORE

AM-FM separation using shunting neural networks

Published in:
Proc. of the IEEE-SP Int. Symp. on Time-Frequency and Time-Scale Analysis, 6-9 October 1998, pp. 553-556.

Summary

We describe an approach to estimating the amplitude-modulated (AM) and frequency-modulated (FM) components of a signal. Any signal can be written as the product of an AM component and an FM component. There have been several approaches to solving the AM-FM estimation problem described in the literature. Popular methods include the use of time-frequency analysis, the Hilbert transform, and the Teager energy operator. We focus on an approach based on FM-to-AM transduction that is motivated by auditory physiology. We show that the transduction approach can be realized as a bank of bandpass filters followed by envelope detectors and shunting neural networks, and the resulting dynamical system is capable of robust AM-FM estimation in noisy environments and over a broad range of filter bandwidths and locations. Our model is consistent with recent psychophysical experiments that indicate AM and FM components of acoustic signals may be transformed into a common neural code in the brain stem via FM-to-AM transduction. Applications of our model include signal recognition and multi-component decomposition.
READ LESS

Summary

We describe an approach to estimating the amplitude-modulated (AM) and frequency-modulated (FM) components of a signal. Any signal can be written as the product of an AM component and an FM component. There have been several approaches to solving the AM-FM estimation problem described in the literature. Popular methods include...

READ MORE

Magnitude-only estimation of handset nonlinearity with application to speaker recognition

Published in:
Proc. of the 1998 IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. II, Speech Processing II; Neural Networks for Signal Processing, 12-15 May 1998, pp. 745-748.

Summary

A method is described for estimating telephone handset nonlinearity by matching the spectral magnitude of the distorted signal to the output of a nonlinear channel model, driven by an undistorted reference. The "magnitude-only" representation allows the model to directly match unwanted speech formants that arise over nonlinear channels and that are a potential source of degradation in speaker and speech recognition algorithms. As such, the method is particularly suited to algorithms that use only spectral magnitude information. The distortion model consists of a memoryless polynomial nonlinearity sandwiched between two finite-length linear filters. Minimization of a mean-squared spectral magnitude error, with respect to model parameters, relies on iterative estimation via a gradient descent technique, using a Jacobian in the iterative correction term with gradients calculated by finite-element approximation. Initial work has demonstrated the algorithm's usefulness in speaker recognition over telephone channels by reducing mismatch between high- and low-quality handset conditions.
READ LESS

Summary

A method is described for estimating telephone handset nonlinearity by matching the spectral magnitude of the distorted signal to the output of a nonlinear channel model, driven by an undistorted reference. The "magnitude-only" representation allows the model to directly match unwanted speech formants that arise over nonlinear channels and that...

READ MORE

Audio signal processing based on sinusoidal analysis/synthesis

Published in:
Chapter 9 in Applications of Digital Signal Processing to Audio and Acoustics, 1998, pp. 343-416.

Summary

Based on a sinusoidal model, an analysis/synthesis technique is developed that characterizes audio signals, such as speech and music, in terms of the amplitudes, frequencies, and phases of the component sine waves. These parameters are estimated by applying a peak-picking algorithm to the short-time Fourier transform of the input waveform. Rapid changes in the highly resolved spectral components are tracked by using a frequency-matching algorithm and the concept of "birth" and "death" of the underlying sine waves. For a given frequency track, a cubic phase function is applied to the sine-wave generator, whose output is amplitude-modulated and added to sines for other frequency tracks. The resulting synthesized signal preserves the general wave form shape and is nearly perceptually indistinguishable from the original, thus providing the basis for a variety of applications including signal modification, sound splicing, morphing and extrapolation, and estimation of sound characteristics such as vibrato. Although this sine-wave analysis/synthesis is applicable to arbitrary signals, tailoring the system to a specific sound class can improve performance. A source/filter phase model is introduced within the sine-wave representation to improve signal modification, as in time-scale and pitch change and dynamic range compression, by attaining phase coherence where sinewave phase relations are preserved or controlled. A similar method of achieving phase coherence is also applied in revisiting the classical phase vocoder to improve modification of certain signal classes. A second refinement of the sine-wave analysis/synthesis invokes an additive deterministic/stochastic representation of sounds consisting of simultaneous harmonic and aharmonic contributions. A method of frequency tracking is given for the separation of these components, and is used in a number of applications. The sinewave model is also extended to two additively combined signals for the separation of simultaneous talkers or music duets. Finally, the use of sine-wave analysis/synthesis in providing insight for FM synthesis is described, and remaining challenges, such as an improved sine-wave representation of rapid attacks and other transient events, are presented.
READ LESS

Summary

Based on a sinusoidal model, an analysis/synthesis technique is developed that characterizes audio signals, such as speech and music, in terms of the amplitudes, frequencies, and phases of the component sine waves. These parameters are estimated by applying a peak-picking algorithm to the short-time Fourier transform of the input waveform...

READ MORE

Noise reduction based on spectral change

Published in:
Proc. of the 1997 IEEE ASSP Workshop on Applications of Signal Processing to Audio and Acoustics, Session 8: Noise Reduction, 19-22 October 1997, 4 pages.

Summary

A noise reduction algorithm is designed for the aural enhancement of short-duration wideband signals. The signal of interest contains components possibly only a few milliseconds in duration and corrupted by nonstationary noise background. The essence of the enhancement technique is a Weiner filter that uses a desired signal spectrum whose estimation adapts to the "degree of stationarity" of the measured signal. The degree of stationarity is derived from a short-time spectral derivative measurement, motivated by sensitivity of biological systems to spectral change. Adaptive filter design tradeoffs are described, reflecting the accuracy of signal attack, background fidelity, and perceptual quality of the desired signal. Residual representations for binaural presentation are also considered.
READ LESS

Summary

A noise reduction algorithm is designed for the aural enhancement of short-duration wideband signals. The signal of interest contains components possibly only a few milliseconds in duration and corrupted by nonstationary noise background. The essence of the enhancement technique is a Weiner filter that uses a desired signal spectrum whose...

READ MORE

Embedded dual-rate sinusoidal transform coding

Published in:
Proc. IEEE Workshop on Speech Coding for Telecommunications Proc.: Back to Basics: Attacking Fundamental Problems in Speech Coding, 7-10 September 1997, pp. 33-34.

Summary

This paper describes the development of a dual-rate Sinusoidal Transformer Coder in which a 2400 b/s coder is embedded as a separate packet in the 4800 b/s bit stream. The underlying coding structure provides the flexibility necessary for multirate speech coding and multimedia applications.
READ LESS

Summary

This paper describes the development of a dual-rate Sinusoidal Transformer Coder in which a 2400 b/s coder is embedded as a separate packet in the 4800 b/s bit stream. The underlying coding structure provides the flexibility necessary for multirate speech coding and multimedia applications.

READ MORE

AM-FM separation using auditory-motivated filters

Published in:
IEEE Trans. Speech Audio Process., Vol. 5, No. 5, September 1997, pp. 465-480.

Summary

An approach to the joint estimation of sine-wave amplitude modulation (AM) and frequency modulation (FM) is described based on the transduction of frequency modulation into amplitude modulation by linear filters, being motivated by the hypothesis that the auditory system uses a similar transduction mechanism in measuring sine-wave FM. An AM-FM estimation is described that uses the amplitude envelope of the output of two transduction filters of piecewise-linear spectral shape. The piecewise-linear constraint is then relaxed, allowing a wider class of transduction-filter pairs for AM-FM separation under a monotonicity constraint of the filters' quotient. The particular case of Gaussian filters, and measured auditory filters, although not leading to a solution in closed form, provide for iterative AM-FM estimation. Solution stability analysis and error evaluation are performed and the FM transduction method is compared with the energy separation algorithm, based on the Teager energy operator, and the Hilbert transform method for AM-FM estimation. Finally, a generalization to two-dimensional (2-D) filters is described.
READ LESS

Summary

An approach to the joint estimation of sine-wave amplitude modulation (AM) and frequency modulation (FM) is described based on the transduction of frequency modulation into amplitude modulation by linear filters, being motivated by the hypothesis that the auditory system uses a similar transduction mechanism in measuring sine-wave FM. An AM-FM...

READ MORE

Fine structure features for speaker identification

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. 2, Speech (Part II), 7-10 May 1996, pp. 689-692.

Summary

The performance of speaker identification (SID) systems can be improved by the addition of the rapidly varying "fine structure" features of formant amplitude and/or frequency modulation and multiple excitation pulses. This paper shows how the estimation of such fine structure features can be improved further by obtaining better estimates of formant frequency locations and uncovering various sources of error in the feature extraction systems. Most female telephone speech showed "spurious" formants, due to distortion in the telephone network. Nevertheless, SID performance was greatest with these spurious formants as formant estimates. A new feature has also been identified which can increase SID performance: cepstral coefficients from noise in the estimated excitation waveform. Finally, statistical tools have been developed to explore the relative importance of features used for SID, with the ultimate goal of uncovering the source of the features that provide SID performance improvement.
READ LESS

Summary

The performance of speaker identification (SID) systems can be improved by the addition of the rapidly varying "fine structure" features of formant amplitude and/or frequency modulation and multiple excitation pulses. This paper shows how the estimation of such fine structure features can be improved further by obtaining better estimates of...

READ MORE

Low rate coding of the spectral envelope using channel gains

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. 2, 7-10 May 1996, pp. 769-772.

Summary

A dual rate embedded sinusoidal transform coder is described in which a core 14th order allpole coder operating at 2400 b/s is augmented with a set of channel gain residuals in order to operate at the higher 4800 b/s rate. The channel gains are a set of non-uniformly spaced samples of the spline envelope and constitute a lowpass estimate of the short-time vocal tract magnitude spectrum. The channel gain residuals represent the difference between the spline envelope and the quantized 14th order allpole spectrum at the channel gain frequencies. The channel gain residuals are coded using pitch dependent scalar quantization. Informal listening indicates that the quality of the embedded coder at 4800 b/s is comparable to that of an existing high quality 4800 b/s allpole coder.
READ LESS

Summary

A dual rate embedded sinusoidal transform coder is described in which a core 14th order allpole coder operating at 2400 b/s is augmented with a set of channel gain residuals in order to operate at the higher 4800 b/s rate. The channel gains are a set of non-uniformly spaced samples...

READ MORE