High-Speed Video

D. D. Mehta, et al., “Direct measurement and modeling of intraglottal, subglottal, and vocal fold collision pressures during phonation in an individual with a hemilaryngectomy,” Applied Sciences, vol. 11, no. 16, pp. 7256, 2021. Publisher's VersionAbstract
The purpose of this paper is to report on the first in vivo application of a recently developed transoral, dual-sensor pressure probe that directly measures intraglottal, subglottal, and vocal fold collision pressures during phonation. Synchronous measurement of intraglottal and subglottal pressures was accomplished using two miniature pressure sensors mounted on the end of the probe and inserted transorally in a 78-year-old male who had previously undergone surgical removal of his right vocal fold for treatment of laryngeal cancer. The endoscopist used one hand to position the custom probe against the surgically medialized scar band that replaced the right vocal fold and used the other hand to position a transoral endoscope to record laryngeal high-speed videoendoscopy of the vibrating left vocal fold contacting the pressure probe. Visualization of the larynx during sustained phonation allowed the endoscopist to place the dual-sensor pressure probe such that the proximal sensor was positioned intraglottally and the distal sensor subglottally. The proximal pressure sensor was verified to be in the strike zone of vocal fold collision during phonation when the intraglottal pressure signal exhibited three characteristics: an impulsive peak at the start of the closed phase, a rounded peak during the open phase, and a minimum value around zero immediately preceding the impulsive peak of the subsequent phonatory cycle. Numerical voice production modeling was applied to validate model-based predictions of vocal fold collision pressure using kinematic vocal fold measures. The results successfully demonstrated feasibility of in vivo measurement of vocal fold collision pressure in an individual with a hemilaryngectomy, motivating ongoing data collection that is designed to aid in the development of vocal dose measures that incorporate vocal fold impact collision and stresses.
H. Ghasemzadeh, D. D. Deliyski, R. E. Hillman, and D. D. Mehta, “Method for horizontal calibration of laser-projection transnasal fiberoptic high-speed videoendoscopy,” Applied Sciences, vol. 11, no. 2, pp. 822, 2021. Publisher's VersionAbstract

Objective: Calibrated horizontal measurements (e.g., mm) from endoscopic procedures could be utilized for advancement of evidence-based practice and personalized medicine. However, the size of an object in endoscopic images is not readily calibrated and depends on multiple factors, including the distance between the endoscope and the target surface. Additionally, acquired images may have significant non-linear distortion that would further complicate calibrated measurements. This study used a recently developed in-vivo laser-projection fiberoptic laryngoscope and proposes a method for calibrated spatial measurements.

Method: A set of circular grids were recorded at multiple working distances. A statistical model was trained that would map from pixel length of the object, the working distance, and the spatial location of the target object into its mm length.

Result: A detailed analysis of the performance of the proposed method is presented. The analyses have shown that the accuracy of the proposed method does not depend on the working distance and length of the target object. The estimated average magnitude of error was 0.27 mm, which is three times lower than the existing alternative.

Conclusion: The presented method can achieve sub-millimeter accuracy in horizontal measurement.

Significance: Evidence-based practice and personalized medicine could significantly benefit from the proposed method. Implications of the findings for other endoscopic procedures are also discussed.

Keywords: Flexible endoscopy; High-speed videoendoscopy; Horizontal calibrated measurements; Image distortion; Instrumental voice assessment; Laser calibration; Laser projection endoscope.

D. D. Deliyski, et al., “Laser-calibrated system for transnasal fiberoptic laryngeal high-speed videoendoscopy,” Journal of Voice, vol. 35, no. 1, pp. 122-128, 2021. Publisher's Version
G. A. Alzamendi, et al., “Bayesian estimation of vocal function measures using laryngeal high-speed videoendoscopy and glottal airflow estimates: An in vivo case study,” The Journal of the Acoustical Society of America, vol. 147, no. 5, pp. EL434-EL439, 2020. Publisher's VersionAbstract
This study introduces the in vivo application of a Bayesian framework to estimate subglottal pressure, laryngeal muscle activation, and vocal fold contact pressure from calibrated transnasal high-speed videoendoscopy and oral airflow data. A subject-specific, lumped-element vocal fold model is estimated using an extended Kalman filter and two observation models involving glottal area and glottal airflow. Model-based inferences using data from a vocally healthy male individual are compared with empirical estimates of subglottal pressure and reference values for muscle activation and contact pressure in the literature, thus providing baseline error metrics for future clinical investigations.
J. T. Heaton, et al., “Aerodynamically driven phonation of individual vocal folds under general anesthesia in canines,” The Laryngoscope, vol. 130, no. 8, pp. 1980-1988, 2020. Publisher's VersionAbstract


We previously developed an instrument called the Aerodynamic Vocal Fold Driver (AVFD) for intraoperative magnified assessment of vocal fold (VF) vibration during microlaryngoscopy under general anesthesia. Excised larynx testing showed that the AVFD could provide useful information about the vibratory characteristics of each VF independently. The present investigation expands those findings by testing new iterations of the AVFD during microlaryngoscopy in the canine model.

Study Design

Animal model.


The AVFD is a handheld instrument that is positioned to contact the phonatory mucosa of either VF during microlaryngoscopy. Airflow delivered through the AVFD shaft to the subglottis drives the VF into phonation‐like vibration, which enables magnified observation of mucosal‐wave function with stroboscopy or high‐speed video. AVFD‐driven phonation was tested intraoperatively (n = 26 VFs) using either the original instrument design or smaller and larger versions three‐dimensionally printed from a medical grade polymer. A high‐fidelity pressure sensor embedded within the AVFD measured VF contact pressure. Characteristics of individual VF phonation were compared with typical two‐fold phonation and compared for VFs scarred by electrocautery (n = 4) versus controls (n = 22).


Phonation was successful in all 26 VFs, even when scar prevented conventional bilateral phonation. The 15‐mm‐wide AVFD fits best within the anteroposterior dimension of the musculo‐membranous VF, and VF contact pressure correlated with acoustic output, driving pressures, and visible modes of vibration.


The AVFD can reveal magnified vibratory characteristics of individual VFs during microlaryngoscopy (e.g., without needing patient participation), potentially providing information that is not apparent or available during conventional awake phonation, which might facilitate phonosurgical decision making.

Level of Evidence


H. Ghasemzadeh, D. D. Deliyski, D. S. Ford, J. B. Kobler, R. E. Hillman, and D. D. Mehta, “Method for vertical calibration of laser-projection transnasal fiberoptic high-speed videoendoscopy,” Journal of Voice, vol. 34, no. 6, pp. 847-861, 2020. Publisher's VersionAbstract
The ability to provide absolute calibrated measurement of the laryngeal structures during phonation is of paramount importance to voice science and clinical practice. Calibrated three-dimensional measurement could provide essential information for modeling purposes, for studying the developmental aspects of vocal fold vibration, for refining functional voice assessment and treatment outcomes evaluation, and for more accurate staging and grading of laryngeal disease. Recently, a laser-calibrated transnasal fiberoptic endoscope compatible with high-speed videoendoscopy (HSV) and capable of providing three-dimensional measurements was developed. The optical principle employed is to project a grid of 7 × 7 green laser points across the field of view (FOV) at an angle relative to the imaging axis, such that (after calibration) the position of each laser point within the FOV encodes the vertical distance from the tip of the endoscope to the laryngeal tissues. The purpose of this study was to develop a precise method for vertical calibration of the endoscope. Investigating the position of the laser points showed that, besides the vertical distance, they also depend on the parameters of the lens coupler, including the FOV position within the image frame and the rotation angle of the endoscope. The presented automatic calibration method was developed to compensate for the effect of these parameters. Statistical image processing and pattern recognition were used to detect the FOV, the center of FOV, and the fiducial marker. This step normalizes the HSV frames to a standard coordinate system and removes the dependence of the laser-point positions on the parameters of the lens coupler. Then, using a statistical learning technique, a calibration protocol was developed to model the trajectories of all laser points as the working distance was varied. Finally, a set of experiments was conducted to measure the accuracy and reliability of every step of the procedure. The system was able to measure absolute vertical distance with mean percent error in the range of 1.7% to 4.7%, depending on the working distance.
M. E. Powell, et al., “Efficacy of videostroboscopy and high-speed videoendoscopy to obtain functional outcomes from perioperative ratings in patients with mass lesions,” Journal of Voice, vol. 34, no. 5, pp. 769-782, 2020. Publisher's VersionAbstract



A major limitation of comparing the efficacy of videostroboscopy (VS) and high-speed videoendoscopy (HSV) is the lack of an objective reference by which to compare the functional assessment ratings of the two techniques. For patients with vocal fold mass lesions, intraoperative measures of lesion size and depth may serve as this objective reference. This study compared the relationships between the pre- to postoperative change in VS and HSV visual-perceptual ratings to intraoperative measures of lesion size and depth.


Prospective visual-perceptual study with intraoperative measures of lesion size and depth.


VS and HSV samples were obtained preoperatively and postoperatively from 28 patients with vocal fold lesions and from 17 vocally healthy controls. Two experienced clinicians rated amplitude, mucosal wave, vertical phase difference, left-right phase asymmetry, and vocal fold edge on a visual-analog scale using both imaging techniques. The change in perioperative ratings from VS and HSV was compared between groups and correlated to intraoperative measures of lesion size and depth.


HSV was as reliable as VS for ratings of amplitude and edge, and substantially more reliable for ratings of mucosal wave and left-right phase asymmetry. Both VS and HSV had mild-moderate correlations between change in perioperative ratings and intraoperative measures of lesion area. Change in function could be obtained in more patients and for more parameters using HSV than VS. Group differences were noted for postoperative ratings of amplitude and edge; however, these differences were within one level of the visual-perceptual rating scale. The presence of asynchronicity in VS recordings renders vibratory features either uninterpretable or potentially distorted and thus should not be rated.


Amplitude and edge are robust vibratory measures for perioperative functional assessment, regardless of imaging modality. HSV is indicated for evaluation of subepithelial lesions or if asynchronicity is present in the VS image sequence.

    D. D. Mehta, et al., “Toward development of a vocal fold contact pressure probe: Bench-top validation of a dual-sensor probe using excised human larynx models,” Applied Sciences, vol. 9, no. 20, pp. 4360, 2019. Publisher's VersionAbstract
    A critical element in understanding voice production mechanisms is the characterization of vocal fold collision, which is widely considered a primary etiological factor in the development of common phonotraumatic lesions such as nodules and polyps. This paper describes the development of a transoral, dual-sensor intraglottal/subglottal pressure probe for the simultaneous measurement of vocal fold collision and subglottal pressures during phonation using two miniature sensors positioned 7.6 mm apart at the distal end of a rigid cannula. Proof-of-concept testing was performed using excised whole-mount and hemilarynx human tissue aerodynamically driven into self-sustained oscillation, with systematic variation of the superior–inferior positioning of the vocal fold collision sensor. In the hemilarynx experiment, signals from the pressure sensors were synchronized with an acoustic microphone, a tracheal-surface accelerometer, and two high-speed video cameras recording at 4000 frames per second for top–down and en face imaging of the superior and medial vocal fold surfaces, respectively. As expected, the intraglottal pressure signal exhibited an impulse-like peak when vocal fold contact occurred, followed by a broader peak associated with intraglottal pressure build-up during the de-contacting phase. As subglottal pressure was increased, the peak amplitude of the collision pressure increased and typically reached a value below that of the average subglottal pressure. Results provide important baseline vocal fold collision pressure data with which computational models of voice production can be developed and in vivo measurements can be referenced.
    M. Motie-Shirazi, et al., “Toward development of a vocal fold contact pressure probe: Sensor characterization and validation using synthetic vocal fold models,” Applied Sciences, vol. 9, no. 15, pp. 3002, 2019. Publisher's VersionAbstract
    Excessive vocal fold collision pressures during phonation are considered to play a primary role in the formation of benign vocal fold lesions, such as nodules. The ability to accurately and reliably acquire intraglottal pressure has the potential to provide unique insights into the pathophysiology of phonotrauma. Difficulties arise, however, in directly measuring vocal fold contact pressures due to physical intrusion from the sensor that may disrupt the contact mechanics, as well as difficulty in determining probe/sensor position relative to the contact location. These issues are quantified and addressed through the implementation of a novel approach for identifying the timing and location of vocal fold contact, and measuring intraglottal and vocal fold contact pressures via a pressure probe embedded in the wall of a hemi-laryngeal flow facility. The accuracy and sensitivity of the pressure measurements are validated against ground truth values. Application to in vivo approaches are assessed by acquiring intraglottal and VF contact pressures using a synthetic, self-oscillating vocal fold model in a hemi-laryngeal configuration, where the sensitivity of the measured intraglottal and vocal fold contact pressure relative to the sensor position is explored.
    M. E. Powell, D. D. Deliyski, R. E. Hillman, S. M. Zeitels, J. A. Burns, and D. D. Mehta, “Comparison of videostroboscopy to stroboscopy derived from high-speed videoendoscopy for evaluating patients with vocal fold mass lesions,” American Journal of Speech-Language Pathology, vol. 25, pp. 576-589, 2016. Publisher's VersionAbstract

    indirect physiological signal to predict the phase of the vocal fold vibratory cycle for sampling. Simulated stroboscopy (SS) extracts the phase of the glottal cycle directly from the changing glottal area in the high-speed videoendoscopy (HSV) image sequence. The purpose of this study is to determine the reliability of SS relative to VS for clinical assessment of vocal fold vibratory function in patients with mass lesions.

    Methods VS and SS recordings were obtained from 28 patients with vocal fold mass lesions before and after phonomicrosurgery and 17 controls who were vocally healthy. Two clinicians rated clinically relevant vocal fold vibratory features using both imaging techniques, indicated their internal level of confidence in the accuracy of their ratings, and provided reasons for low or no confidence.

    Results SS had fewer asynchronous image sequences than VS. Vibratory outcomes were able to be computed for more patients using SS. In addition, raters demonstrated better interrater reliability and reported equal or higher levels of confidence using SS than VS.

    Conclusion Stroboscopic techniques on the basis of extracting the phase directly from the HSV image sequence are more reliable than acoustic-based VS. Findings suggest that SS derived from high-speed videoendoscopy is a promising improvement over current VS systems.

    C. E. Stepp, M. Zañartu, D. D. Mehta, and R. E. Hillman, “Hyperfunctional voice disorders: Current results, clinical implications, and future directions of a multidisciplinary research program,” Proceedings of the Annual Convention of the American Speech-Language-Hearing Association, 2016.
    N. Iftimia, G. Maguluri, E. Chang, J. Park, J. Kobler, and D. Mehta, “Dynamic vocal fold imaging with combined optical coherence tomography/high-speed video endoscopy,” Proceedings of the 10th International Conference on Voice Physiology and Biomechanics, pp. 1-2, 2016. Paper
    G. Maguluri, E. Chang, N. Iftimia, D. Mehta, and J. Kobler, “Dynamic vocal fold imaging by integrating optical coherence tomography with laryngeal high-speed video endoscopy,” Proceedings of the Conference on Lasers and Electro-Optics (CLEO), pp. 1-2, 2015.Abstract

    We demonstrate three-dimensional vocal fold imaging during phonation by integrating optical coherence tomography with high-speed videoendoscopy. Results from ex vivo larynx experiments yield reconstructed vocal fold surface contours for ten phases of periodic motion.

    D. D. Mehta, D. D. Deliyski, S. M. Zeitels, M. Zañartu, and R. E. Hillman, “Integration of transnasal fiberoptic high-speed videoendoscopy with time-synchronized recordings of vocal function”, K. Izdebski, Y. Yan, R. R. Ward, B. J. F. Wong, and R. M. Cruz, Ed. San Francisco: Pacific Voice & Speech Foundation, 2015, pp. 105-114. Publisher's Version
    D. D. Deliyski, R. E. Hillman, and D. D. Mehta, “Laryngeal high-speed videoendoscopy: Rationale and recommendation for accurate and consistent terminology,” Journal of Speech, Language, and Hearing Research, vol. 58, no. 5, pp. 1488-1492, 2015. Publisher's VersionAbstract

    Abstract Purpose: The authors discuss the rationale behind the term laryngeal high-speed videoendoscopy to describe the application of high-speed endoscopic imaging techniques to the visualization of vocal fold vibration. Method: Commentary on the advantages of using accurate and consistent terminology in the field of voice research is provided. Specific justification is described for each component of the term high-speed videoendoscopy, which is compared and contrasted with alternative terminologies in the literature. Results: In addition to the ubiquitous high-speed descriptor, the term endoscopy is necessary to specify the appropriate imaging technology and distinguish among modalities such as ultrasound, magnetic resonance imaging, and nonendoscopic optical imaging. Furthermore, the term video critically indicates the electronic recording of a sequence of optical still images representing scenes in motion, in contrast to strobed images using high-speed photography and non-optical high-speed magnetic resonance imaging. High-speed videoendoscopy thus concisely describes the technology and can be appended by the desired anatomical nomenclature such as laryngeal. Conclusions: Laryngeal high-speed videoendoscopy strikes a balance between conciseness and specificity when referring to the typical high-speed imaging method performed on human participants. Guidance for the creation of future terminology provides clarity and context for current and future experiments and the dissemination of results among researchers.

    G. Luegmair, D. D. Mehta, J. B. Kobler, and M. Döllinger, “Three-dimensional optical reconstruction of vocal fold kinematics using high-speed video with a laser projection system,” IEEE Transactions on Medical Imaging, vol. 34, no. 12, pp. 2572-2582, 2015. Publisher's Version Paper
    M. L. Cooke, D. D. Mehta, and R. E. Hillman, “Relationships between the Cepstral-Spectral Index of Dysphonia and vocal fold vibratory function during phonation,” Proceedings of the 43rd Annual Symposium of the Voice Foundation: Care of the Professional Voice, 2014. Poster
    M. Zañartu, J. C. Ho, D. D. Mehta, R. E. Hillman, and G. R. Wodicka, “Acoustic coupling during incomplete glottal closure and its effect on the inverse filtering of oral airflow,” Proceedings of Meetings on Acoustics, vol. 19, pp. 060241-7, 2013. Paper
    D. D. Mehta, et al., “High-speed videomicroscopy and acoustic analysis of ex vivo vocal fold vibratory asymmetry,” Proceedings of the 10th International Conference on Advances in Quantitative Laryngology, Voice and Speech Research, 2013. Paper
    D. D. Mehta and R. E. Hillman, “The evolution of methods for imaging vocal fold phonatory function,” Perspectives on Speech Science and Orofacial Disorders, vol. 22, no. 1, pp. 5-13, 2012. Publisher's VersionAbstract

    In this article, we provide a brief summary of the major technological advances that led to current methods for imaging vocal fold vibration during phonation including the development of indirect laryngoscopy, imaging of rapid motion, fiber optics, and digital image capture. We also provide a brief overview of new emerging technologies that could be used in the future for voice research and clinical voice assessment, including advances in laryngeal high-speed videoendoscopy, depth-kymography, and dynamic optical coherence tomography.