PURPOSE OF REVIEW: This paper describes recent advances in perceptual, acoustic, aerodynamic, and endoscopic imaging methods for assessing voice function. RECENT FINDINGS: We review advances from four major areas. PERCEPTUAL ASSESSMENT: Speech-language pathologists are being encouraged to use the new consensus auditory-perceptual evaluation of voice inventory for auditory-perceptual assessment of voice quality, and recent studies have provided new insights into listener reliability issues that have plagued subjective perceptual judgments of voice quality. ACOUSTIC ASSESSMENT: Progress is being made on the development of algorithms that are more robust for analyzing disordered voices, including the capability to extract voice quality-related measures from running speech segments. AERODYNAMIC ASSESSMENT: New devices for measuring phonation threshold air pressures and air flows have the potential to serve as sensitive indices of glottal phonatory conditions, and recent developments in aeroacoustic theory may provide new insights into laryngeal sound production mechanisms. ENDOSCOPIC IMAGING: The increased light sensitivity of new ultra high-speed color digital video processors is enabling high-quality endoscopic imaging of vocal fold tissue motion at unprecedented image capture rates, which promises to provide new insights into the mechanisms of normal and disordered voice production. SUMMARY: Some of the recent research advances in voice function assessment could be more readily adopted into clinical practice, whereas others will require further development.
The current study investigates the synthesis and analysis of aspiration noise in synthesized andspoken vowels. Based on the linear source-filter model of speech production, we implement a vowelsynthesizer in which the aspiration noise source is temporally modulated by the periodic sourcewaveform. Modulations in the noise source waveform and their synchrony with the periodic sourceare shown to be salient for natural-sounding vowel synthesis. After developing the synthesisframework, we research past approaches to separate the two additive components of the model. Achallenge for analysis based on this model is the accurate estimation of the aspiration noisecomponent that contains energy across the frequency spectrum and temporal characteristics due tomodulations in the noise source. Spectral harmonic/noise component analysis of spoken vowelsshows evidence of noise modulations with peaks in the estimated noise source componentsynchronous with both the open phase of the periodic source and with time instants of glottalclosure.Inspired by this observation of natural modulations in the aspiration noise source, we develop analternate approach to the speech signal processing aim of accurate pitch-scale modification. Theproposed strategy takes a dual processing approach, in which the periodic and noise components ofthe speech signal are separately analyzed, modified, and re-synthesized. The periodic component ismodified using our implementation of time-domain pitch-synchronous overlap-add, and the noisecomponent is handled by modifying characteristics of its source waveform. Since we have modeledan inherent coupling between the original periodic and aspiration noise sources, the modificationalgorithm is designed to preserve the synchrony between temporal modulations of the two sources.The reconstructed modified signal is perceived to be natural-sounding and generally reduces artifactsthat are typically heard in current modification techniques.