Based on a sinusoidal model, an analysis/synthesis technique is developed that characterizes audio signals, such as speech and music, in terms of the amplitudes, frequencies, and phases of the component sine waves. These parameters are estimated by applying a peak-picking algorithm to the short-time Fourier transform of the input waveform. Rapid changes in the highly resolved spectral components are tracked by using a frequency-matching algorithm and the concept of "birth" and "death" of the underlying sine waves. For a given frequency track, a cubic phase function is applied to the sine-wave generator, whose output is amplitude-modulated and added to sines for other frequency tracks. The resulting synthesized signal preserves the general wave form shape and is nearly perceptually indistinguishable from the original, thus providing the basis for a variety of applications including signal modification, sound splicing, morphing and extrapolation, and estimation of sound characteristics such as vibrato. Although this sine-wave analysis/synthesis is applicable to arbitrary signals, tailoring the system to a specific sound class can improve performance. A source/filter phase model is introduced within the sine-wave representation to improve signal modification, as in time-scale and pitch change and dynamic range compression, by attaining phase coherence where sinewave phase relations are preserved or controlled. A similar method of achieving phase coherence is also applied in revisiting the classical phase vocoder to improve modification of certain signal classes. A second refinement of the sine-wave analysis/synthesis invokes an additive deterministic/stochastic representation of sounds consisting of simultaneous harmonic and aharmonic contributions. A method of frequency tracking is given for the separation of these components, and is used in a number of applications. The sinewave model is also extended to two additively combined signals for the separation of simultaneous talkers or music duets. Finally, the use of sine-wave analysis/synthesis in providing insight for FM synthesis is described, and remaining challenges, such as an improved sine-wave representation of rapid attacks and other transient events, are presented.

READ LESS

Summary

Audio signal processing based on sinusoidal analysis/synthesis

Embedded dual-rate sinusoidal transform coding

September 10, 1997

Conference Paper

Author:

Elliot Singer

…

Published in:

Proc. IEEE Workshop on Speech Coding for Telecommunications Proc.: Back to Basics: Attacking Fundamental Problems in Speech Coding, 7-10 September 1997, pp. 33-34.

Topic:

speech processing

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

This paper describes the development of a dual-rate Sinusoidal Transformer Coder in which a 2400 b/s coder is embedded as a separate packet in the 4800 b/s bit stream. The underlying coding structure provides the flexibility necessary for multirate speech coding and multimedia applications.

READ LESS

Summary

Embedded dual-rate sinusoidal transform coding

Low rate coding of the spectral envelope using channel gains

May 7, 1996

Conference Paper

Author:

Elliot Singer

…

Published in:

Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. 2, 7-10 May 1996, pp. 769-772.

Topic:

speech processing

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

A dual rate embedded sinusoidal transform coder is described in which a core 14th order allpole coder operating at 2400 b/s is augmented with a set of channel gain residuals in order to operate at the higher 4800 b/s rate. The channel gains are a set of non-uniformly spaced samples of the spline envelope and constitute a lowpass estimate of the short-time vocal tract magnitude spectrum. The channel gain residuals represent the difference between the spline envelope and the quantized 14th order allpole spectrum at the channel gain frequencies. The channel gain residuals are coded using pitch dependent scalar quantization. Informal listening indicates that the quality of the embedded coder at 4800 b/s is comparable to that of an existing high quality 4800 b/s allpole coder.

READ LESS

Summary

Low rate coding of the spectral envelope using channel gains

Sine-wave amplitude coding using a mixed LSF/PARCOR representation

September 20, 1995

Conference Paper

Author:

Robert B. Dunn

…

Published in:

Proc. 1995 IEEE Workshop on Speech Coding for Telecommunications, 20-22 Spetember 1995, pp. 77-8.

Topic:

speech processing

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

An all-pole model of the speech spectral envelope is used to code the sine-wave amplitudes in the Sinusoidal Transform Coder. While line spectral frequencies (LSFs) are currently used to represent this all-pole model, it is shown that a mixture of line spectral frequencies and partial correlation (PARCOR) coefficients can be used to reduce complexity without a loss in quantization efficiency. Objective and subjective measures demonstrate that speech quality is maintained. In addition, the use of split vector quantization is shown to substantially reduce the number of bits needed to code the all-pole model.

READ LESS

Summary

Sine-wave amplitude coding using a mixed LSF/PARCOR representation

Sinusoidal coding

January 1, 1995

Book Chapter

Author:

Robert J. McAulay

…

Thomas F. Quatieri

Published in:

Chapter 4 in Speech Coding and Synthesis, Elsevier Science Publishers, 1995, pp. 121-173.

Topic:

speech processing

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

This chapter summarizes the sinewave-based pitch extractor, and the high-order all-pole modelling techniques that provided the basis for the multirate Sinusoidal Transform Coder and its application to multi-speaker conferencing.

READ LESS

Summary

Sinusoidal coding

Shape invariant time-scale and pitch modification of speech

March 1, 1992

Journal Article

Author:

Thomas F. Quatieri

…

Robert J. McAulay

Published in:

IEEE Trans. Signal Process., Vol. 40, No. 3, March 1992, pp. 497-510.

Topic:

speech enhancement

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

The simplified linear model of speech production predicts that when the rate of articulation is changed, the resulting waveform takes on the appearance of the original, except for a change in the time scale. The goal of this paper is to develop a time-scale modification system that preserves this shape-invariance property during voicing. This is done using a version of the sinusoidal analysis-synthesis system that models and independently modifies the phase contributions of the vocal tract and vocal cord excitation. An important property of the system is its capability of performing time-varying rates of change. Extensions of the method are applied to fixed and time-varying pitch modification of speech. The sine-wave analysis-synthesis system also allows for shape-invariant joint time-scale and pitch modification, and allows for the adjustment of the time scale and pitch according to speech characteristics such as the degree of voicing.

READ LESS

Summary

Shape invariant time-scale and pitch modification of speech

Low-rate speech coding based on the sinusoidal model

October 1, 1991

Book Chapter

Author:

Robert J. McAulay

…

Thomas F. Quatieri

Published in:

Chapter 6 in Advances in Speech Signal Processing, Marcel Dekker, Inc., 1992, pp. 165-208.

Topic:

speech processing

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

One approach to the problem of representation of speech signals is to use the speech production model in which speech is viewed as the result of passing a glottal excitation waveform through a time-varying linear filter that models the resonant characteristics of the vocal tract. In many applications it suffices to assume that the glottal excitation can be in one of two possible states corresponding to voiced or unvoiced speech. In attempts to design high-quality speech coders at the midband rates, generalizations of the binary excitation model have been developed. One such approach is multipulse (Atal and Remde, 1982) which uses more than one pitch pulse to model voiced speech and a possibly random set of pulses to model unvoiced speech. Code excited linear prediction (CELP) (Schroeder and Atal, 1985) is another representation which models the excitation as one of a number of random sequences or "codewords" superimposed on periodic pitch pulses. In this chapter the goal is also to generalize the model for the glottal excitation; but instead of using impulses as in multipulse or random sequences as in CELP, the excitation is assumed to be composed of sinusoidal components of arbitrary amplitudes, frequencies, and phases (McAulay and Quatieri, 1986).

READ LESS

Summary

Low-rate speech coding based on the sinusoidal model

Peak-to-rms reduction of speech based on a sinusoidal model

February 1, 1991

Journal Article

Author:

Thomas F. Quatieri

…

Robert J. McAulay

Published in:

IEEE Trans. Signal Process., Vol. 39, No. 2, February 1991, pp. 273-288.

Topic:

speech enhancement

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

In a number of applications, a speech waveform is processed using phase dispersion and amplitude compression to reduce its peak-to-rms ratio so as to increase loudness and intelligibility while minimizing perceived distortion. In this paper, a sinusoidal-based analysis/synthesis system is used to apply a radar design solution to the problem of dispersing the phase of a speech waveform. Unlike conventional methods of phase dispersion, this solution technique adapts dynamically to the pitch and spectral characteristics of the speech, while maintaining the original spectral envelope. The solution can also be used to drive the sine-wave amplitude modification for amplitude compression, and is coupled to the desired shaping of the speech spectrum. The new dispersion solution, when integrated with amplitude compression, results in a significant reduction in the peak-to-rms ratio of the speech waveform with acceptable loss in quality. Application of a real-time prototype sine-wave preprocessor to AM radio broadcasting is described.

READ LESS

Summary

Peak-to-rms reduction of speech based on a sinusoidal model

Noise reduction using a soft-decision sine-wave vector quantizer

April 6, 1990

Conference Paper

Author:

Thomas F. Quatieri

…

Robert J. McAulay

Published in:

Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. 2, Speech Processing 2; VLSI, Audio and Electroacoustics, 3-6 April 1990, pp. 821-824.

Topic:

speech enhancement

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

The need for noise reduction arises in speech communication channels, such as ground-to-air transmission and ground-based cellular radio, to improve vocoder quality and speech recognition accuracy. In this paper, noise reduction is performed in the context of a high-quality harmonic serc-phase sine-wave analysis/synthesis system which is characterized by sine-wave amplitudes, a voicing probability, and a fundamental frequency. Least-squared error estimation of a harmonic sine-wave representation leads to a "soft decision" template estimate consisting of sine-wave amplitudes and a voicing probability. The least-squares solution is modified to use template-matching with "nearest neighbors." The reconstruction is improved by using the modified least-squares solution only in spectral regions with low signal-to-noise ratio. The results, although preliminary, provide evidence that harmonic zero-phase sine-wave analysis/synthesis, combined with effective estimation of sine-wave amplitudes and probability of voicing, offers a promising approach to noise reduction.

READ LESS

Summary

Noise reduction using a soft-decision sine-wave vector quantizer

Phase coherence in speech reconstruction for enhancement and coding applications

May 26, 1989

Conference Paper

Author:

Thomas F. Quatieri

…

Robert J. McAulay

Published in:

Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. 1, Speech Processing 1, 23-26 May 1989, pp. 207-209.

Topic:

speech enhancement

R&D area:

Cyber Security and Information Sciences

R&D group:

Artificial Intelligence Technology and Systems

Summary

It has been shown that an analysis-synthesis system based on a sinusoidal representation leads to synthetic speech that is essentially perceptually indistinguishable from the original. A change in speech quality has been observed, however, when the phase relation of the sine waves is altered. This occurs in practice when sine waves are processed for speech enhancement (e.g., time-scale modification and reducing peak-to-RMS ratio) and for speech coding. This paper describes a zero-phase sinusoidal analysis-synthesis system which generates natural-sounding speech without the requirement of vocal tract phase. The method provides a basis for improving sound quality by providing different levels of phase coherence in speech reconstruction for time-scale modification, for a baseline system for coding, and for reducing the peak-to-RMS ration by dispersion.

READ LESS

Summary

Phase coherence in speech reconstruction for enhancement and coding applications

Publications

Refine Results

By

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Low-rate speech coding based on the sinusoidal model

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Summary

Showing Results