Publications

Refine Results

(Filters Applied) Clear All

An introduction to computing with neural nets

Published in:
IEEE ASSP Mag., Vol. 4, No. 2, April 1987, pp. 4-22.

Summary

Artificial neural net models have been studied for many years in the hope of achieving human-like performance in the fields of speech and image recognition. These models are composed of many nonlinear computational elements operating in parallel and arranged in patterns reminiscent of biological neural nets. Computational elements or nodes are connected via weights that are typically adapted during use to improve performance. There has been a recent resurgence in the field of artificial neural nets caused by new net topologies and algorithms, analog VLSI implementation techniques, and the belief that massive parallelism is essential for high performance speech and image recognition. This paper provides an introduction to the field of artificial neural nets by reviewing six important neural net models that can be used for pattern classification. These nets are highly parallel building blocks that illustrate neural net components and design principles and can be used to construct more complex systems. In addition to describing these nets, a major emphasis is placed on exploring how some existing classification and clustering algorithms can be performed using simple neuron-like components. Single-layer nets can implement algorithms required by Gaussian maximum-likelihood classifiers and optimum minimum-error classifiers for binary patterns corrupted by noise. More generally, the decision regions required by any classification algorithm can be generated in a straightforward manner by three-layer feed-forward nets.
READ LESS

Summary

Artificial neural net models have been studied for many years in the hope of achieving human-like performance in the fields of speech and image recognition. These models are composed of many nonlinear computational elements operating in parallel and arranged in patterns reminiscent of biological neural nets. Computational elements or nodes...

READ MORE

Speech transformations based on a sinusoidal representation

Published in:
IEEE Trans. Acoust. Speech Signal Process., Vol. ASSP-34, No. 6, December 1986, pp. 1449-1464.

Summary

In this paper a new speech analysis/synthesis technique is presented which provides the basis for a general class of speech transformations including time-scale modification, frequency scaling, and pitch modification. These modifications can be performed with a time-varying change, permitting continuous adjustment of a speaker's fundamental frequency rate of articulation. The method is based on a sinusoidal representation of the speech production mechanism which has been shown to produce synthetic speech that preserves the waveform shape and is perceptually indistinguishable from the original. Although the analysis/synthesis system was originally designed for single speaker signals, it is also capable ot recovering and modifying non-speech signals such as music, multiple speakers, marine biologic sounds, and speakers in the presence of interferences such as noise and musical backgrounds.
READ LESS

Summary

In this paper a new speech analysis/synthesis technique is presented which provides the basis for a general class of speech transformations including time-scale modification, frequency scaling, and pitch modification. These modifications can be performed with a time-varying change, permitting continuous adjustment of a speaker's fundamental frequency rate of articulation. The...

READ MORE

Speech analysis/synthesis based on a sinusoidal representation

Published in:
IEEE Trans. Acoust. Speech Signal Process., Vol. ASSP-34, No. 4, August 1986, pp. 744-754.

Summary

A sinusoidal model for the speech waveform is used to develop a new analysis/synthesis technique that is characterized by the amplitudes, frequencies, and phases of the component sine waves. These parameters are estimated from the short-time Fourier transform using a simple peak-picking algorithm. Rapid changes in the highly resolved spectral components are tracked using the concept of "birth" and "death" of the underlying sine waves. For a given frequency track a cubic function is used to unwrap and interpolate the phase such that the phase track is maximally smooth. This phase function is applied to a sine-wave generator, which is amplitude modulated and added to the other sine waves to give the final speech output. The resulting synthetic waveform preserves the general waveform shape and is essentially perceptually indistinguishable from the original speech. Furthermore, in the presence of noise the perceptual characteristics of the speech as well as the noise are maintained. In addition, it was found that the representation was sufficiently general that high-quality reproduction was obtained for a larger class of inputs including: two overpallping, superposed speech waveforms; music waveforms; speech in musical backgrounds; and certain marine biologic sounds. Finally, the analysis/synthesis system forms the basis for new approaches to the problems of speech transformations including time-scale and pitch-scale modification, and midrate speech coding.
READ LESS

Summary

A sinusoidal model for the speech waveform is used to develop a new analysis/synthesis technique that is characterized by the amplitudes, frequencies, and phases of the component sine waves. These parameters are estimated from the short-time Fourier transform using a simple peak-picking algorithm. Rapid changes in the highly resolved spectral...

READ MORE

Robust HMM-based techniques for recognition of speech produced under stress and in noise

Published in:
Proc. Speech Tech '86, 28-30 April 1986, pp. 241-249.

Summary

Substantial improvements in speech recognition performance on speech produced under stress and in noise have been achieved through the development of techniques for enhancing the robustness of a base-line isolated-word Hidden Markov Model recognizer. The baseline HMM is a continuous-observation system using mel-frequency cepstra as the observation parameters. Enhancement techniques which were developed and tested include: placing a lower limit on the estimated variances of the observations; addition of temporal difference parameters; improved duration modelling; use of fixed diagonal covariance distance functions, with variances adjusted according to perceptual considerations; cepstral domain stress compensation; and multi-style training, where the system is trained on speech spoken with a variety of talking styles. With perceptually-motivated covariance and a combination of normal (single-frame) and differential cepstral observations, average error rates over five simulated-stress conditions were reduced from 20% (baseline) to 2.5% on a simulated-stress data base (105-word vocabulary, eight talkers, five conditions). With variance limiting, normal plus differential observations, and multi-style training, an error rate of 1.8% was achieved. Additional tests were conducted on a data base including nine talkers, eight talking styles, with speech produced under two levels of motor-workload stress. Substantial reductions in error rate were demonstrated for the noise and workload conditions, when multiple talking styles, rather than only normal speech, were used in training. In experiments conducted in simulated fighter cockpit noise, it was shown that error rates could be reduced significantly by training under multiple noise exposure conditions.
READ LESS

Summary

Substantial improvements in speech recognition performance on speech produced under stress and in noise have been achieved through the development of techniques for enhancing the robustness of a base-line isolated-word Hidden Markov Model recognizer. The baseline HMM is a continuous-observation system using mel-frequency cepstra as the observation parameters. Enhancement techniques...

READ MORE

A new application of adaptive noise cancellation

Published in:
IEEE Trans. Acoust., Speech, Sig Process., Vol. ASSP-34, No. 1, February 1986, pp. 21-7.

Summary

A new application of Widrow's adaptive noise cancellation (ANC) is presented in this paper. Specifically, the method is applied to the case where an acoustic barrier exists between the primary and reference microphones. By updating the coefficients of the noise estimation filter only during silence, it is shown that ANC can provide substantial noise reduction with little speech distortion even when the acoustic barrier provides only moderate attenuation of acoustic signals. The use of the modified ANC method is evaluated using an oxygen facemask worn by fighter aircraft pilots. Experiments demonstrate that if a noise field is created using a single source, 11 dB signal-to-noise ratio improvements can be achieved by attaching a reference microphone to the exterior of the facemask. The length of the ANC filter required for this particular environment is only 50 points.
READ LESS

Summary

A new application of Widrow's adaptive noise cancellation (ANC) is presented in this paper. Specifically, the method is applied to the case where an acoustic barrier exists between the primary and reference microphones. By updating the coefficients of the noise estimation filter only during silence, it is shown that ANC...

READ MORE

Adaptive noise cancellation in a fighter cockpit environment

Published in:
ICASSP'84, IEEE Int. Conf. on Acoustics, Speech and Signal Processing, 19-21 March 1984.

Summary

In this paper we discuss some preliminary results on using Widrow's Adaptive Noise Cancelling (ANC) algorithm to reduce the background noise present in a fighter pilot's speech. With a dominant noise source present and with the pilot wearing an oxygen facemask, we demonstrate that good (>10 dB) cancellation of the additive noise and little speech distortion can be achieved by having the reference microphone attached to the outside of the facemask and by updating the filter coefficients only during silence intervals.
READ LESS

Summary

In this paper we discuss some preliminary results on using Widrow's Adaptive Noise Cancelling (ANC) algorithm to reduce the background noise present in a fighter pilot's speech. With a dominant noise source present and with the pilot wearing an oxygen facemask, we demonstrate that good (>10 dB) cancellation of the...

READ MORE

Experience with speech communication in packet networks

Published in:
IEEE J. Sel. Areas Commun., Vol. SAC-1, No. 6, December 1983, pp. 963-980.

Summary

The integration of digital voice with data in a common packet-switched network system offers a number of potential benefits, including reduced systems cost through sharing of switching and transmission resources, flexible internetworking among systems utilizing different transmission media, and enhanced services for users requiring access to both voice and data communications. Issues which it has been necessary to address in order to realize these benefits include reconstitution of speech from packets arriving at nonuniform intervals, maximization of packet speech multiplexing efficiency, and determination of the implementation requirements for terminals and switching in a large-scale packet voice/data system. A series of packet speech systems experiments to address these issues has been conducted under the sponsorship of the Defense Advanced Research Projects Agency (DARPA). In the initial experiments on the ARPANET, the basic feasibility of speech communication on a store-and-forward packet network was demonstrated. Techniques were developed for reconstitution of speech from packets, and protocols were developed for call setup and for speech transport. Later speech experiments utilizing the Atlantic packet satellite network (SATNET) led to the development of techniques for efficient voice conferencing in a broadcast environment, and for internetting speech between a store-and-forward net (ARPANET) and a broadcast net (SATNET). Large-scale packet speech multiplexing experiments could not be carried out on ARPANET or SATNET where the network link capacities severely restrict the number of speech users that can be accommodated. However, experiments are currently being carried out using a wide-band satellite-based packet system designed to accommodate a sufficient number of simultaneous users to support realistic experiments in efficient statistical multiplexing. Key developments to date associated with the wide-band experiments have been 1) techniques for internetting via voice/data gateways from a variety of local access networks (packet cable, packet radio, and circuit-switched) to a long-haul broadcast satellite network and 2) compact implementations of packet voice terminals with full protocol and voice capabilities. Basic concepts and issues associated with packet speech systems are described. Requirements and techniques for speech processing, voice protocols, packetization and reconstitution, conferencing, and multiplexing are discussed in the context of a generic packet speech system configuration. Specific experimental configurations and key packet speech results on the ARPANET, SATNET, and wide-band system are reviewed.
READ LESS

Summary

The integration of digital voice with data in a common packet-switched network system offers a number of potential benefits, including reduced systems cost through sharing of switching and transmission resources, flexible internetworking among systems utilizing different transmission media, and enhanced services for users requiring access to both voice and data...

READ MORE

Frequency sampling of the short-time Fourier-transform magnitude for signal reconstruction

Published in:
J. Opt. Soc. Amer., Vol. 73, November 1983, pp. 1523- 1526.

Summary

Unique recovery of a signal from the magnitude (modulus) of the Fourier transform has been of long-standing interest in image and optical processing in which Fourier-transform phase is lost or difficult to measure. We investigate an alternative problem of recovering a signal from the Fourier-transform magnitude of overlapping regions of the signal, i.e., from the short-time (or -space) Fourier-transform magnitude. Recently it was established that a discrete-time signal x (n) can be uniquely obtained under mild restrictions from its short-time Fourier-transform magnitude. In this paper we extend this result to the case when the short-time Fourier-transform magnitude is known at only one or two frequencies for each n. We also present a recursive algorithm for recovering a sequence from such samples and demonstrate the algorithm with an example.
READ LESS

Summary

Unique recovery of a signal from the magnitude (modulus) of the Fourier transform has been of long-standing interest in image and optical processing in which Fourier-transform phase is lost or difficult to measure. We investigate an alternative problem of recovering a signal from the Fourier-transform magnitude of overlapping regions of...

READ MORE

The Experimental Integrated Switched Network - a system-level network test facility

Published in:
Proc. 1983 IEEE Military Communications Conf., MILCOM, 31 October-2 November 1983.

Summary

An Experimental Integrated Switched Network (EISN) has been developed to provide a system-level testbed for the evaluation of advanced communications networking techniques, including survivable network routing algorithms using a mix of transmission media, for application in the Defense Switched Network (DSN). EISN includes five CONUS sites linked by a wideband demand-assigned satellite channel and by dialed-up terrestrial trunks for alternate satellite/terrestrial routing experiments. Experiments to date have validated techniques for integration of circuit-switched terrestrial systems with the demand-assigned satellite system, and for the establishment of alternate routes over satellite and terrestrial paths. Currently, candidate routing algorithms for application in the DSN are being implemented and tested using external routing/controller processors attached to digital circuit switches at EISN sites. In addition, EISN is also being used to support data communication experiments using DoD standard data protocols in a combined satellite/terrestrial network environment. Work is ongoing both in system experiments and in testbed developments to include additional capabilities. This paper represents a description and status report on both the testbed and the experimental efforts.
READ LESS

Summary

An Experimental Integrated Switched Network (EISN) has been developed to provide a system-level testbed for the evaluation of advanced communications networking techniques, including survivable network routing algorithms using a mix of transmission media, for application in the Defense Switched Network (DSN). EISN includes five CONUS sites linked by a wideband...

READ MORE

Object detection by two-dimensional linear prediction

Published in:
MIT Lincoln Laboratory Report TR-632

Summary

An important component of any automated image analysis system is the detection and classification of objects. In this report, we consider the first of these problems where the specific goal is to detect anomalous areas (e.g., man-made objects) in textured backgrounds such as trees, grass, and fields of aerial photographs. Our detection algorithm relies on a significance test which adapts itself to the changing background in such a way that a constant false alarm rate is maintained. Furthermore, this test has a potentially practical implementation since it can be expressed in terms of the residuals of an adaptive two-dimensional linear predictor. The algorithm is demonstrated with both synthetic and realworld images.
READ LESS

Summary

An important component of any automated image analysis system is the detection and classification of objects. In this report, we consider the first of these problems where the specific goal is to detect anomalous areas (e.g., man-made objects) in textured backgrounds such as trees, grass, and fields of aerial photographs...

READ MORE