Publications

Refine Results

(Filters Applied) Clear All

Demonstrations and applications of spoken language technology: highlights and perspectives from the 1993 ARPA Spoken Language Technology and Applications Day

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, Vol. 1, Speech Processing, 19-22 April 1994, pp. 337-340.

Summary

The ARPA Spoken Language Technology and Applications Day (SLTA'93) was a special workshop which presented a set of live, state-of-the-art demonstrations of speech recognition and Spoken Language Understanding systems. The purpose of this paper is to provide perspective on current opportunities for applications which they can enable, and reviewing the applications opportunities and needs cited by panelists and other members of the user community.
READ LESS

Summary

The ARPA Spoken Language Technology and Applications Day (SLTA'93) was a special workshop which presented a set of live, state-of-the-art demonstrations of speech recognition and Spoken Language Understanding systems. The purpose of this paper is to provide perspective on current opportunities for applications which they can enable, and reviewing the...

READ MORE

Opportunities for advanced speech processing in military computer-based systems

Published in:
Proc. IEEE, Vol. 79, No. 11, November 1991, pp. 1626-1641.

Summary

This paper presents a study of military applications of advanced speech processing technology which includes three major elements: 1) review and assessment of current efforts in military applications of speech technology; 2) identification of opportunities for future military applications of advanced speech technology; and 3) identification of problem areas where research in speech processing is needed to meet application requirements, and of current research thrusts which appear promising. The relationship of this study to previous assessments of military applications of speech technology is discussed and substantial recent progress is noted. Current efforts in military applications of speech technology which are highlighted include: 1) narrow-band (2400 his) and very low-rate (50-1200 his) secure voice communication; 2) voice/data integration in computer networks; 3) speech recognition in fighter aircraft, military helicopters, battle management, and air traffic control training systems; and 4) noise and interference removal for human listeners. Opportunities for advanced applications are identified by means of descriptions of several generic systems which would be possible with advances in speech technology and in system integration. These generic systems include 1) an integrated multirate voice data communications terminal; 2) an interactive speech enhancement system; 3) a voice-controlled pilot's associate system; 4) advanced air traffic control training systems; 5) a battle management command and control support system with spoken natural language interface; and 6) a spoken language translation system. In identifying problem areas and research efforts to meet application requirements, it is observed that some of the most promising research involves the integration of speech algorithm techniques including speech coding, speech recognition, and speaker recognition.
READ LESS

Summary

This paper presents a study of military applications of advanced speech processing technology which includes three major elements: 1) review and assessment of current efforts in military applications of speech technology; 2) identification of opportunities for future military applications of advanced speech technology; and 3) identification of problem areas where...

READ MORE

Automatic talker activity labeling for co-channel talker interference suppression

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, Vol. 2, Speech Processing 2; VLSI; Audio and Electroacoustics, ICASSP, 3-6 April 1990, pp. 813-816.

Summary

This paper describes a speaker activity detector taking co-channel speech as input and labeling intervals of the input as target-only, jammer-only, or two-speaker (target+jammer). The algorithms applied were borrowed primarily from speaker recognition, thereby allowing us to use speaker-dependent test-utterance-independent information in a front-end for co-channel talker interference suppression. Parameters studied included classifier choice (vector quantization vs. Gaussian), training method (unsupervised vs. supervised), test utterance segmentation (uniform vs. adaptive), and training and testing target-to-jammer ratios. Using analysis interval lengths of 100 ms, performance reached 80% correct detection.
READ LESS

Summary

This paper describes a speaker activity detector taking co-channel speech as input and labeling intervals of the input as target-only, jammer-only, or two-speaker (target+jammer). The algorithms applied were borrowed primarily from speaker recognition, thereby allowing us to use speaker-dependent test-utterance-independent information in a front-end for co-channel talker interference suppression. Parameters...

READ MORE

Robust speech recognition using hidden Markov models: overview of a research program

Summary

This report presents an overview of a program of speech recognition research which was initiated in 1985 with the major goal of developing techniques for robust high performance speech recognition under the stress and noise conditions typical of a military aircraft cockpit. The work on recognition in stress and noise during 1985 and 1986 produced a robust Hidden Markov Model (HMM) isolated-word recognition (IWR) system with 99 percent speaker-dependent accuracy for several difficult stress/noise data bases, and very high performance for normal speech. Robustness techniques which were developed and applied include multi-style training, robust estimation of parameter variances, perceptually-motivated stress-tolerant distance measures, use of time-differential speech parameters, and discriminant analysis. These techniques and others produced more than an order-of-magnitude reduction in isolated-word recognition error rate relative to a baseline HMM system. An important feature of the Lincoln HMM system has been the use of continuous-observation HMM techniques, which provide a good basis for the development of the robustness techniques, and avoid the need for a vector quantizer at the input to the HMM system. Beginning in 1987, the robust HMM system has been extended to continuous speech recognition for both speaker-dependent and speaker-independent tasks. The robust HMM continuous speech recognizer was integrated in real-time with a stressing simulated flight task, which was judged to be very realistic by a number of military pilots. Phrase recognition accuracy on the limited-task-domain (28-word vocabulary) flight task is better than 99.9 percent. Recently, the robust HMM system has been extended to large-vocabulary continuous speech recognition, and has yielded excellent performance in both speaker-dependent and speaker-independent recognition on the DARPA 1000-word vocabulary resource management data base. Current efforts include further improvements to the HMM system, techniques for the integration of speech recognition with natural language processing, and research on integration of neural network techniques with HMM.
READ LESS

Summary

This report presents an overview of a program of speech recognition research which was initiated in 1985 with the major goal of developing techniques for robust high performance speech recognition under the stress and noise conditions typical of a military aircraft cockpit. The work on recognition in stress and noise...

READ MORE

Spoken language systems

Summary

Spoken language is the most natural and common form of human-human communication, whether face to face, over the telephone, or through various communication media such as radio and television. In contrast, human-machine interaction is currently achieved largely through keyboard strokes, pointing, or other mechanical means, using highly stylized languages. Communication, whether human-human or human-machine, suffers greatly when the two communicating agents do not "speak" the same language. The ultimate goal of work on spoken language systems is to overcome this language barrier by building systems that provide the necessary interpretive function between various languages, thus establishing spoken language as a versatile and natural communication medium between humans and machines and among humans speaking different languages.
READ LESS

Summary

Spoken language is the most natural and common form of human-human communication, whether face to face, over the telephone, or through various communication media such as radio and television. In contrast, human-machine interaction is currently achieved largely through keyboard strokes, pointing, or other mechanical means, using highly stylized languages. Communication...

READ MORE

Speech-state-adaptive simulation of co-channel talker interference suppression

Published in:
Proc. IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP, 23-26 May 1989, pp. 361-364.

Summary

A co-channel talker interference suppression system processes an input waveform containing the sum of two simultaneous speech signals, referred to as the target and the jammer, to produce a waveform estimate of the target speech signal alone. This paper describes the evaluation of a simulated suppression system performing ideal suppression of a jammer signal given the voicing states (voiced, unvoiced, silent) of the target and jammer speech as a function of time and given the isolated target and jammer speech waveforms. By applying suppression to select regions of jammer speech as a function of the voicing states of the target and jammer, and by measuring the intelligibility of the resulting jammer suppressed co-channel speech, it is possible to identify those regions of co-channel speech on which interference suppression most improves intelligibility. Such results can help focus algorithm development efforts.
READ LESS

Summary

A co-channel talker interference suppression system processes an input waveform containing the sum of two simultaneous speech signals, referred to as the target and the jammer, to produce a waveform estimate of the target speech signal alone. This paper describes the evaluation of a simulated suppression system performing ideal suppression...

READ MORE

Robust HMM-based techniques for recognition of speech produced under stress and in noise

Published in:
Proc. Speech Tech '86, 28-30 April 1986, pp. 241-249.

Summary

Substantial improvements in speech recognition performance on speech produced under stress and in noise have been achieved through the development of techniques for enhancing the robustness of a base-line isolated-word Hidden Markov Model recognizer. The baseline HMM is a continuous-observation system using mel-frequency cepstra as the observation parameters. Enhancement techniques which were developed and tested include: placing a lower limit on the estimated variances of the observations; addition of temporal difference parameters; improved duration modelling; use of fixed diagonal covariance distance functions, with variances adjusted according to perceptual considerations; cepstral domain stress compensation; and multi-style training, where the system is trained on speech spoken with a variety of talking styles. With perceptually-motivated covariance and a combination of normal (single-frame) and differential cepstral observations, average error rates over five simulated-stress conditions were reduced from 20% (baseline) to 2.5% on a simulated-stress data base (105-word vocabulary, eight talkers, five conditions). With variance limiting, normal plus differential observations, and multi-style training, an error rate of 1.8% was achieved. Additional tests were conducted on a data base including nine talkers, eight talking styles, with speech produced under two levels of motor-workload stress. Substantial reductions in error rate were demonstrated for the noise and workload conditions, when multiple talking styles, rather than only normal speech, were used in training. In experiments conducted in simulated fighter cockpit noise, it was shown that error rates could be reduced significantly by training under multiple noise exposure conditions.
READ LESS

Summary

Substantial improvements in speech recognition performance on speech produced under stress and in noise have been achieved through the development of techniques for enhancing the robustness of a base-line isolated-word Hidden Markov Model recognizer. The baseline HMM is a continuous-observation system using mel-frequency cepstra as the observation parameters. Enhancement techniques...

READ MORE

Experience with speech communication in packet networks

Published in:
IEEE J. Sel. Areas Commun., Vol. SAC-1, No. 6, December 1983, pp. 963-980.

Summary

The integration of digital voice with data in a common packet-switched network system offers a number of potential benefits, including reduced systems cost through sharing of switching and transmission resources, flexible internetworking among systems utilizing different transmission media, and enhanced services for users requiring access to both voice and data communications. Issues which it has been necessary to address in order to realize these benefits include reconstitution of speech from packets arriving at nonuniform intervals, maximization of packet speech multiplexing efficiency, and determination of the implementation requirements for terminals and switching in a large-scale packet voice/data system. A series of packet speech systems experiments to address these issues has been conducted under the sponsorship of the Defense Advanced Research Projects Agency (DARPA). In the initial experiments on the ARPANET, the basic feasibility of speech communication on a store-and-forward packet network was demonstrated. Techniques were developed for reconstitution of speech from packets, and protocols were developed for call setup and for speech transport. Later speech experiments utilizing the Atlantic packet satellite network (SATNET) led to the development of techniques for efficient voice conferencing in a broadcast environment, and for internetting speech between a store-and-forward net (ARPANET) and a broadcast net (SATNET). Large-scale packet speech multiplexing experiments could not be carried out on ARPANET or SATNET where the network link capacities severely restrict the number of speech users that can be accommodated. However, experiments are currently being carried out using a wide-band satellite-based packet system designed to accommodate a sufficient number of simultaneous users to support realistic experiments in efficient statistical multiplexing. Key developments to date associated with the wide-band experiments have been 1) techniques for internetting via voice/data gateways from a variety of local access networks (packet cable, packet radio, and circuit-switched) to a long-haul broadcast satellite network and 2) compact implementations of packet voice terminals with full protocol and voice capabilities. Basic concepts and issues associated with packet speech systems are described. Requirements and techniques for speech processing, voice protocols, packetization and reconstitution, conferencing, and multiplexing are discussed in the context of a generic packet speech system configuration. Specific experimental configurations and key packet speech results on the ARPANET, SATNET, and wide-band system are reviewed.
READ LESS

Summary

The integration of digital voice with data in a common packet-switched network system offers a number of potential benefits, including reduced systems cost through sharing of switching and transmission resources, flexible internetworking among systems utilizing different transmission media, and enhanced services for users requiring access to both voice and data...

READ MORE

The Experimental Integrated Switched Network - a system-level network test facility

Published in:
Proc. 1983 IEEE Military Communications Conf., MILCOM, 31 October-2 November 1983.

Summary

An Experimental Integrated Switched Network (EISN) has been developed to provide a system-level testbed for the evaluation of advanced communications networking techniques, including survivable network routing algorithms using a mix of transmission media, for application in the Defense Switched Network (DSN). EISN includes five CONUS sites linked by a wideband demand-assigned satellite channel and by dialed-up terrestrial trunks for alternate satellite/terrestrial routing experiments. Experiments to date have validated techniques for integration of circuit-switched terrestrial systems with the demand-assigned satellite system, and for the establishment of alternate routes over satellite and terrestrial paths. Currently, candidate routing algorithms for application in the DSN are being implemented and tested using external routing/controller processors attached to digital circuit switches at EISN sites. In addition, EISN is also being used to support data communication experiments using DoD standard data protocols in a combined satellite/terrestrial network environment. Work is ongoing both in system experiments and in testbed developments to include additional capabilities. This paper represents a description and status report on both the testbed and the experimental efforts.
READ LESS

Summary

An Experimental Integrated Switched Network (EISN) has been developed to provide a system-level testbed for the evaluation of advanced communications networking techniques, including survivable network routing algorithms using a mix of transmission media, for application in the Defense Switched Network (DSN). EISN includes five CONUS sites linked by a wideband...

READ MORE

Voice communication in integrated digital voice and data networks

Published in:
IEEE Trans. Commun., Vol. COM-28, No. 9, September 1980, pp. 1478-90.

Summary

Voice communication networks have traditionally been designed to provide either analog signal paths or fixed-rate synchronous digital connections between individual subscribers. These designs were aimed at accommodating the "streamlike" character of speech, which has traditionally been considered to flow from source to destination at a more or less constant rate. By way of contrast, interactive and computer-to-computer data transactions tend to be "bursty" in nature, and this has given rise to the development of packet-switching methods for data communications. The dichotomous nature of these two major traffic classes and the apparent conflict between the types of network services they require has resulted in the deployment of separate military communications facilities for voice and data. A challenge in the design of future systems is to achieve overall economy and flexibility in the allocation of resources via the efficient integration of both traffic types in common network facilities. This paper summarizes a number of advanced concepts for switching and flow control of combined voice and data traffic in integrated environments. Performance characteristics are described based on analysis results and computer simulation studies for both multilink terrestrial and broadcast satellite network topologies.
READ LESS

Summary

Voice communication networks have traditionally been designed to provide either analog signal paths or fixed-rate synchronous digital connections between individual subscribers. These designs were aimed at accommodating the "streamlike" character of speech, which has traditionally been considered to flow from source to destination at a more or less constant rate...

READ MORE