A model-based inverse filtering scheme is proposed for an accurate, non-invasive estimation of the aerodynamic source of voiced sounds at the glottis. The approach, referred to as subglottal impedance-based inverse filtering (IBIF), takes as input the signal from a lightweight accelerometer placed on the skin over the extrathoracic trachea and yields estimates of glottal airflow and its time derivative, offering important advantages over traditional methods that deal with the supraglottal vocal tract. The proposed scheme is based on mechano-acoustic impedance representations from a physiologically-based transmission line model and a lumped skin surface representation. A subject-specific calibration protocol is used to account for individual adjustments of subglottal impedance parameters and mechanical properties of the skin. Preliminary results for sustained vowels with various voice qualities show that the subglottal IBIF scheme yields comparable estimates with respect to current aerodynamics-based methods of clinical vocal assessment. A mean absolute error of less than 10% was observed for two glottal airflow measures—maximum flow declination rate and amplitude of the modulation component—that have been associated with the pathophysiology of some common voice disorders caused by faulty and/or abusive patterns of vocal behavior (i.e., vocal hyperfunction). The proposed method further advances the ambulatory assessment of vocal function based on the neck acceleration signal, that previously have been limited to the estimation of phonation duration, loudness, and pitch. Subglottal IBIF is also suitable for other ambulatory applications in speech communication, in which further evaluation is underway.
PURPOSE OF REVIEW: To summarize recent technological advancements and insight into the role of stroboscopy in laryngeal imaging. RECENT FINDINGS: Although stroboscopic technology has not undergone major technological improvements, recent clarifications have been made to the application of stroboscopic principles to video-based laryngeal imaging. Also recent advances in coupling stroboscopy with high-definition video cameras provide higher spatial resolution of vocal fold vibratory function during phonation. Studies indicate that the interrater reliability of visual stroboscopic assessment varies depending on the laryngeal feature being rated and that only a subset of features may be needed to be representative of an entire assessment. High-speed videoendoscopy (HSV) judgments have been shown to be more sensitive than stroboscopy for evaluating vocal fold phase asymmetry, pointing to the future potential of complementing stroboscopy with alternative imaging modalities in hybrid systems. Laryngeal videostroboscopy alone continues to play a central role in clinical voice assessment. Even though HSV may provide more detailed information about phonatory function, its eventual clinical adoption will depend on how remaining practical, technical, and methodological challenges will be met. SUMMARY: Laryngeal videostroboscopy continues to be the modality of choice for imaging vocal fold vibration, but technological advancements in HSV and associated research findings are driving increased interest in the clinical adoption of HSV to complement videostroboscopic assessment.
In this article, we provide a brief summary of the major technological advances that led to current methods for imaging vocal fold vibration during phonation including the development of indirect laryngoscopy, imaging of rapid motion, fiber optics, and digital image capture. We also provide a brief overview of new emerging technologies that could be used in the future for voice research and clinical voice assessment, including advances in laryngeal high-speed videoendoscopy, depth-kymography, and dynamic optical coherence tomography.
Many common voice disorders are chronic or recurring conditions that are likely to result from faulty and/or abusive patterns of vocal behavior, referred to generically as vocal hyperfunction. An ongoing goal in clinical voice assessment is the development and use of noninvasively derived measures to quantify and track the daily status of vocal hyperfunction so that the diagnosis and treatment of such behaviorally based voice disorders can be improved. This paper reports on the development of a new, versatile, and cost-effective clinical tool for mobile voice monitoring that acquires the high-bandwidth signal from an accelerometer sensor placed on the neck skin above the collarbone. Using a smartphone as the data acquisition platform, the prototype device provides a user-friendly interface for voice use monitoring, daily sensor calibration, and periodic alert capabilities. Pilot data are reported from three vocally normal speakers and three subjects with voice disorders to demonstrate the potential of the device to yield standard measures of fundamental frequency and sound pressure level and model-based glottal airflow properties. The smartphone-based platform enables future clinical studies for the identification of the best set of measures for differentiating between normal and hyperfunctional patterns of voice use.
Purpose: In prior work, a manually derived measure of vocal fold vibratory phase asymmetry correlated to varying degrees with visual judgments made from laryngeal high-speed videoendoscopy (HSV) recordings. This investigation extended this work by establishing an automated HSV-based framework to quantify 3 categories of vocal fold vibratory asymmetry. Method: HSV-based analysis provided for cycle-to-cycle estimates of left-right phase asymmetry, left-right amplitude asymmetry, and axis shift during glottal closure for 52 speakers with no vocal pathology producing comfortable and pressed phonation. An initial cross-validation of the automated left-right phase asymmetry measure was performed by correlating the measure with other objective and subjective assessments of phase asymmetry. Results: Vocal fold vibratory asymmetry was exhibited to a similar extent in both comfortable and pressed phonations. The automated measure of left-right phase asymmetry strongly correlated with manually derived measures and moderately correlated with visual-perceptual ratings. Correlations with the visual-perceptual ratings remained relatively consistent as the automated measure was derived from kymograms taken at different glottal locations. Conclusions: An automated HSV-based framework for the quantification of vocal fold vibratory asymmetry was developed and initially validated. This framework serves as a platform for investigating relationships between vocal fold tissue motion and acoustic measures of voice function.
Vocal fold vibratory asymmetry is often associated with inefficient sound production through its impact on source spectral tilt. This association is investigated in both a computational voice production model and a group of 47 human subjects. The model provides indirect control over the degree of left-right phase asymmetry within a nonlinear source-filter framework, and high-speed videoendoscopy provides in vivo measures of vocal fold vibratory asymmetry. Source spectral tilt measures are estimated from the inverse-filtered spectrum of the simulated and recorded radiated acoustic pressure. As expected, model simulations indicate that increasing left-right phase asymmetry induces steeper spectral tilt. Subject data, however, reveal that none of the vibratory asymmetry measures correlates with spectral tilt measures. Probing further into physiological correlates of spectral tilt that might be affected by asymmetry, the glottal area waveform is parameterized to obtain measures of the open phase (open/plateau quotient) and closing phase (speed/closing quotient). Subjects' left-right phase asymmetry exhibits low, but statistically significant, correlations with speed quotient (r=0.45) and closing quotient (r=-0.39). Results call for future studies into the effect of asymmetric vocal fold vibration on glottal airflow and the associated impact on voice source spectral properties and vocal efficiency.