D. D. Mehta, D. D. Deliyski, S. M. Zeitels, T. F. Quatieri, and R. E. Hillman, “Voice production mechanisms following phonosurgical treatment of early glottic cancer,” Annals of Otology, Rhinology, and Laryngology, vol. 119, pp. 1-9, 2010. Publisher's VersionAbstract

    Objectives: Although near-normal conversational voices can be achieved with the phonosurgical management of earlyglottic cancer, there are still acoustic and aerodynamic deficits in vocal function that must be better understood to helpfurther optimize phonosurgical interventions. Stroboscopic assessment is inadequate for this purpose.Methods: A newly developed color high-speed videoendoscopy (HSV) system that included time-synchronized recordingsof the acoustic signal was used to perform a detailed examination of voice production mechanisms in 14 subjects.Digital image processing techniques were used to quantify glottal phonatory function and to delineate relationships betweenvocal fold vibratory properties and acoustic perturbation measures.Results: The results for multiple measurements of vibratory asymmetry showed that 31% to 62% of subjects displayedhigher-than-normal average values, whereas the mean values for glottal closure duration (open quotient) and periodicityof vibration fell within normal limits. The average HSV-based measures did not correlate significantly with the acousticperturbation measures, but moderate correlations were exhibited between the acoustic measures and the SDs of the HSVbasedparameters.Conclusions: The use of simultaneous, time-synchronized HSV and acoustic recordings can provide new insights intopostoperative voice production mechanisms that cannot be obtained with stroboscopic assessment.

    D. D. Mehta, D. D. Deliyski, and R. E. Hillman, “Commentary on why laryngeal stroboscopy really works: Clarifying misconceptions surrounding Talbot's law and the persistence of vision,” Journal of Speech, Language, and Hearing Research, vol. 53, no. 5, pp. 1263-1267, 2010. Publisher's VersionAbstract

    PURPOSE: The purpose of this article is to clear up misconceptions that have propagated in the clinical voice literature that inappropriately cite Talbot's law (1834) and the theory of persistence of vision as the scientific principles that underlie laryngeal stroboscopy. METHOD: After initial research into Talbot's (1834) original studies, it became clear that his experiments were not designed to explain why stroboscopy works. Subsequently, a comprehensive literature search was conducted for the purpose of investigating the general principles of stroboscopic imaging from primary sources. RESULTS: Talbot made no reference to stroboscopy in designing his experiments, and the notion of persistence of vision is not applicable to stroboscopic motion. Instead, two visual phenomena play critical roles: (a) the flicker-free perception of light and (b) the perception of apparent motion. In addition, the integration of stroboscopy with video-based technology in today's voice clinic requires additional complexities to include synchronization with camera frame rates. CONCLUSIONS: References to Talbot's law and the persistence of vision are not relevant to the generation of stroboscopic images. The critical visual phenomena are the flicker-free perception of light intensity and the perception of apparent motion from sampled images. A complete understanding of how laryngeal stroboscopy works will aid in better interpreting clinical findings during voice assessment.

    D. D. Mehta and R. E. Hillman, “Voice assessment: Updates on perceptual, acoustic, aerodynamic, and endoscopic imaging methods,” Current Opinion in Otolaryngology & Head and Neck Surgery, vol. 16, pp. 211-215, 2008. Publisher's VersionAbstract

    PURPOSE OF REVIEW: This paper describes recent advances in perceptual, acoustic, aerodynamic, and endoscopic imaging methods for assessing voice function. RECENT FINDINGS: We review advances from four major areas. PERCEPTUAL ASSESSMENT: Speech-language pathologists are being encouraged to use the new consensus auditory-perceptual evaluation of voice inventory for auditory-perceptual assessment of voice quality, and recent studies have provided new insights into listener reliability issues that have plagued subjective perceptual judgments of voice quality. ACOUSTIC ASSESSMENT: Progress is being made on the development of algorithms that are more robust for analyzing disordered voices, including the capability to extract voice quality-related measures from running speech segments. AERODYNAMIC ASSESSMENT: New devices for measuring phonation threshold air pressures and air flows have the potential to serve as sensitive indices of glottal phonatory conditions, and recent developments in aeroacoustic theory may provide new insights into laryngeal sound production mechanisms. ENDOSCOPIC IMAGING: The increased light sensitivity of new ultra high-speed color digital video processors is enabling high-quality endoscopic imaging of vocal fold tissue motion at unprecedented image capture rates, which promises to provide new insights into the mechanisms of normal and disordered voice production. SUMMARY: Some of the recent research advances in voice function assessment could be more readily adopted into clinical practice, whereas others will require further development.

    D. Mehta and T. F. Quatieri, “Aspiration noise during phonation: Synthesis, analysis, and pitch-scale modification,” Massachusetts Institute of Technology, 2006.Abstract

    The current study investigates the synthesis and analysis of aspiration noise in synthesized andspoken vowels. Based on the linear source-filter model of speech production, we implement a vowelsynthesizer in which the aspiration noise source is temporally modulated by the periodic sourcewaveform. Modulations in the noise source waveform and their synchrony with the periodic sourceare shown to be salient for natural-sounding vowel synthesis. After developing the synthesisframework, we research past approaches to separate the two additive components of the model. Achallenge for analysis based on this model is the accurate estimation of the aspiration noisecomponent that contains energy across the frequency spectrum and temporal characteristics due tomodulations in the noise source. Spectral harmonic/noise component analysis of spoken vowelsshows evidence of noise modulations with peaks in the estimated noise source componentsynchronous with both the open phase of the periodic source and with time instants of glottalclosure.Inspired by this observation of natural modulations in the aspiration noise source, we develop analternate approach to the speech signal processing aim of accurate pitch-scale modification. Theproposed strategy takes a dual processing approach, in which the periodic and noise components ofthe speech signal are separately analyzed, modified, and re-synthesized. The periodic component ismodified using our implementation of time-domain pitch-synchronous overlap-add, and the noisecomponent is handled by modifying characteristics of its source waveform. Since we have modeledan inherent coupling between the original periodic and aspiration noise sources, the modificationalgorithm is designed to preserve the synchrony between temporal modulations of the two sources.The reconstructed modified signal is perceived to be natural-sounding and generally reduces artifactsthat are typically heard in current modification techniques.