Purpose This study attempts to gain insights into the role of daily voice use in the etiology and pathophysiology of phonotraumatic vocal hyperfunction (PVH) by applying a logistic regression-based daily phonotrauma index (DPI) to predict group-based improvements in patients with PVH after laryngeal surgery and/or postsurgical voice therapy. Method A custom-designed ambulatory voice monitor was used to collect 1 week of pre- and postsurgery data from 27 female patients with PVH; 13 of these patients were also monitored after postsurgical voice therapy. Normative weeklong data were obtained from 27 matched controls. Each week was represented by the DPI, standard deviation of the difference between the first and second harmonic amplitudes (H1-H2). Results Compared to pretreatment, the DPI significantly decreased in the patient group after surgery (Cohen's d effect size = -0.86) and voice therapy (d = -1.06). The patient group DPI only normalized after voice therapy. Conclusions The DPI produced the expected pattern of improved ambulatory voice use across laryngeal surgery and postsurgical voice therapy in a group of patients with PVH. The results were interpreted as providing new objective information about the role of daily voice use in the etiology and pathophysiology of PVH. The DPI is viewed as an estimate of potential vocal fold trauma that relies on combining the long-term distributional characteristics of two parameters representing the magnitude of phonatory forces (neck-surface acceleration magnitude) and vocal fold closure dynamics (H1-H2). Further validation of the DPI is needed to better understand its potential clinical use.
Purpose The purpose of this study was to determine whether estimates of glottal aerodynamic measures based on neck-surface vibration are comparable to those previously obtained using oral airflow and air pressure signals (Espinoza et al., 2017) in terms of discriminating patients with phonotraumatic and nonphonotraumatic vocal hyperfunction (PVH and NPVH) from vocally healthy controls. Method Consecutive /pae/ syllables at comfortable and loud level were produced by 16 women with PVH (organic vocal fold lesions), 16 women with NPVH (primary muscle tension dysphonia), and 32 vocally healthy women who were each matched to a patient according to age and occupation. Subglottal impedance-based inverse filtering of the anterior neck-surface accelerometer (ACC) signal yielded estimates of peak-to-peak glottal airflow, open quotient, and maximum flow declination rate. Average subglottal pressure and microphone-based sound pressure level (SPL) were also estimated from the ACC signal using subject-specific linear regression models. The ACC-based measures of glottal aerodynamics were normalized for SPL and statistically compared between each patient and matched-control group. Results Patients with PVH and NPVH exhibited lower SPL-normalized glottal aerodynamics values than their respective control subjects (p values ranging from < .01 to .07) with very large effect sizes (1.04-2.16), regardless of loudness condition or measurement method (i.e., ACC-based values maintained discriminatory power). Conclusions The results of this study demonstrate that ACC-based estimates of most glottal aerodynamic measures are comparable to those previously obtained from oral airflow and air pressure (Espinoza et al., 2017) in terms of differentiating between hyperfunctional (PVH and NPVH) and normal vocal function. ACC-based estimates of glottal aerodynamic measures may be used to assess vocal function during continuous speech and enables this assessment of daily voice use during ambulatory monitoring to provide better insight into the pathophysiological mechanisms associated with vocal hyperfunction.
Purpose The purpose of this viewpoint article is to facilitate research on vocal hyperfunction (VH). VH is implicated in the most commonly occurring types of voice disorders, but there remains a pressing need to increase our understanding of the etiological and pathophysiological mechanisms associated with VH to improve the prevention, diagnosis, and treatment of VH-related disorders. Method A comprehensive theoretical framework for VH is proposed based on an integration of prevailing clinical views and research evidence. Results The fundamental structure of the current framework is based on a previous (simplified) version that was published over 30 years ago (Hillman et al., 1989). A central premise of the framework is that there are two primary manifestations of VH-phonotraumatic VH and nonphonotraumatic VH-and that multiple factors contribute and interact in different ways to cause and maintain these two types of VH. Key hypotheses are presented about the way different factors may contribute to phonotraumatic VH and nonphonotraumatic VH and how the associated disorders may respond to treatment. Conclusions This updated and expanded framework is meant to help guide future research, particularly the design of longitudinal studies, which can lead to a refinement in knowledge about the etiology and pathophysiology of VH-related disorders. Such new knowledge should lead to further refinements in the framework and serve as a basis for improving the prevention and evidence-based clinical management of VH.
Speakers typically modify their voice in the presence of increased background noise levels, exhibiting the classic Lombard effect. Lombard-related characteristics during everyday activities were recorded from 17 vocally healthy women who wore an acoustic noise dosimeter and ambulatory voice monitor. The linear relationship between vocal sound pressure level and environmental noise level exhibited an average slope of 0.54 dB/dB and value of 72.8 dB SPL at 50 dBA when correlation coefficients were greater than 0.4. These results, coupled with analyses of spectral and cepstral vocal function measures, provide normative ambulatory Lombard characteristics for comparison with patients with voice-use related disorders.
Modern operational environments can place significant demands on a service member's cognitive resources, increasing the risk of errors or mishaps due to overburden. The ability to monitor cognitive burden and associated performance within operational environments is critical to improving mission readiness. As a key step toward a field-ready system, we developed a simulated marksmanship scenario with an embedded working memory task in an immersive virtual reality environment. As participants performed the marksmanship task, they were instructed to remember numbered targets and recall the sequence of those targets at the end of the trial. Low and high cognitive load conditions were defined as the recall of three- and six-digit strings, respectively. Physiological and behavioral signals recorded included speech, heart rate, breathing rate, and body movement. These features were input into a random forest classifier that significantly discriminated between the low- and high-cognitive load conditions (AUC = 0.94). Behavioral features of gait were the most informative, followed by features of speech. We also showed the capability to predict performance on the digit recall (AUC = 0.71) and marksmanship (AUC = 0.58) tasks. The experimental framework can be leveraged in future studies to quantify the interaction of other types of stressors and their impact on operational cognitive and physical performance.
The goal of this study was to employ frequently used analysis methods and tasks to identify values for cepstral peak prominence (CPP) that can aid clinical voice evaluation. Experiment 1 identified CPP values to distinguish speakers with and without voice disorders. Experiment 2 was an initial attempt to estimate auditory-perceptual ratings of overall dysphonia severity using CPP values.
CPP was computed using the Analysis of Dysphonia in Speech and Voice (ADSV) program and Praat. Experiment 1 included recordings from 295 patients with medically diagnosed voice disorders and 50 vocally healthy control speakers. Speakers produced sustained /a/ vowels and the English language Rainbow Passage. CPP cutoff values that best distinguished patient and control speakers were identified. Experiment 2 analyzed recordings from 32 English speakers with varying dysphonia severity and provided preliminary validation of the Experiment 1 cutoffs. Speakers sustained the /a/ vowel and read four sentences from the Consensus Auditory-Perceptual Evaluation of Voice protocol. Trained listeners provided auditory-perceptual ratings of overall dysphonia for the recordings, which were estimated using CPP values in a linear regression model whose performance was evaluated using the coefficient of determination (r2).
Experiment 1 identified CPP cutoff values of 11.46 dB (ADSV) and 14.45 dB (Praat) for the sustained /a/ vowels and 6.11 dB (ADSV) and 9.33 dB (Praat) for the Rainbow Passage. CPP values below those thresholds indicated the presence of a voice disorder with up to 94.5% accuracy. In Experiment 2, CPP values estimated ratings of overall dysphonia with r2 values up to .74.
The CPP cutoff values identified in Experiment 1 provide normative reference points for clinical voice evaluation based on sustained /a/ vowels and the Rainbow Passage. Experiment 2 provides an initial predictive framework that can be used to relate CPP values to the auditory perception of overall dysphonia severity based on sustained /a/ vowels and Consensus Auditory-Perceptual Evaluation of Voice sentences.
Given the established linear relationship between neck surface vibration magnitude and mean subglottal pressure (Ps) in vocally healthy speakers, the purpose of this study was to better understand the impact of the presence of a voice disorder on this baseline relationship.
Data were obtained from participants with voice disorders representing a variety of glottal conditions, including phonotraumatic vocal hyperfunction, nonphonotraumatic vocal hyperfunction, and unilateral vocal fold paralysis. Participants were asked to repeat /p/-vowel syllable strings from loud-to-soft loudness levels in multiple vowel contexts (/pa/, /pi/, /pu/) and pitch levels (comfortable, higher than comfortable, lower than comfortable). Three statistical metrics were computed to analyze the regression line between neck surface accelerometer (ACC) signal magnitude and Ps within and across pitch, vowel, and voice disorder category: coefficient of determination (r2), slope, and intercept. Three linear mixed-effects models were used to evaluate the impact of voice disorder category, pitch level, and vowel context on the relationship between ACC signal magnitude and Ps.
The relationship between ACC signal magnitude and Ps was statistically different in patients with voice disorders than in vocally healthy controls; patients exhibited higher levels of Ps given similar values of ACC signal magnitude. Negligible effects were found for pitch condition within each voice disorder category, and negligible-to-small effects were found for vowel context. The mean of patient-specific r2 values was .63, ranging from .13 to .92.
The baseline, linear relationship between ACC signal magnitude and Ps is affected by the presence of a voice disorder, with the relationship being participant-specific. Further work is needed to improve ACC-based prediction of Ps, across treatment, and during naturalistic speech production.
This study introduces the in vivo application of a Bayesian framework to estimate subglottal pressure, laryngeal muscle activation, and vocal fold contact pressure from calibrated transnasal high-speed videoendoscopy and oral airflow data. A subject-specific, lumped-element vocal fold model is estimated using an extended Kalman filter and two observation models involving glottal area and glottal airflow. Model-based inferences using data from a vocally healthy male individual are compared with empirical estimates of subglottal pressure and reference values for muscle activation and contact pressure in the literature, thus providing baseline error metrics for future clinical investigations.
Previous work using ambulatory voice recordings has shown no differences in average vocal behavior between patients with phonotraumatic vocal hyperfunction and matched controls. This study used larger groups to replicate these results and expanded the analysis to include distributional characteristics of ambulatory voice use and measures indicative of glottal closure.
Subjects included 180 adult women: 90 diagnosed with vocal fold nodules or polyps and 90 age-, sex-, and occupation-matched controls with no history of voice disorders. Weeklong summary statistics (average, variability, skewness, kurtosis) of voice use were computed from neck-surface acceleration recorded using an ambulatory voice monitor. Voice measures included estimates of sound pressure level (SPL), fundamental frequency (fo), cepstral peak prominence, and the difference between the first and second harmonic magnitudes (H1–H2).
Statistical comparisons resulted in medium–large differences (Cohen's d ≥ 0.5) between groups for SPL skewness, fo variability, and H1–H2 variability. Two logistic regressions (theory-based and stepwise) found SPL skewness and H1–H2 variability to classify patients and controls based on their weekly voice data, with an area under the receiver operating characteristic curve of 0.85 and 0.82 on training and test sets, respectively.
Compared to controls, the weekly voice use of patients with phonotraumatic vocal hyperfunction reflected higher SPL tendencies (negatively skewed SPL) with more abrupt glottal closure (reduced H1–H2 variability, especially toward higher values). Further work could examine posttreatment data (e.g., after surgery and/or therapy) to determine the extent to which these differences are associated with the etiology and pathophysiology of phonotraumatic vocal fold lesions.
Subglottal air pressure plays a major role in voice production and is a primary factor in controlling voice onset, offset, sound pressure level, glottal airflow, vocal fold collision pressures, and variations in fundamental frequency. Previous work has shown promise for the estimation of subglottal pressure from an unobtrusive miniature accelerometer sensor attached to the anterior base of the neck during typical modal voice production across multiple pitch and vowel contexts. This study expands on that work to incorporate additional accelerometer-based measures of vocal function to compensate for non-modal phonation characteristics and achieve an improved estimation of subglottal pressure. Subjects with normal voices repeated /p/-vowel syllable strings from loud-to-soft levels in multiple vowel contexts (/a/, /i/, and /u/), pitch conditions (comfortable, lower than comfortable, higher than comfortable), and voice quality types (modal, breathy, strained, and rough). Subject-specific, stepwise regression models were constructed using root-mean-square (RMS) values of the accelerometer signal alone (baseline condition) and in combination with cepstral peak prominence, fundamental frequency, and glottal airflow measures derived using subglottal impedance-based inverse filtering. Five-fold cross-validation assessed the robustness of model performance using the root-mean-square error metric for each regression model. Each cross-validation fold exhibited up to a 25% decrease in prediction error when the model incorporated multi-dimensional aspects of the accelerometer signal compared with RMS-only models. Improved estimation of subglottal pressure for non-modal phonation was thus achievable, lending to future studies of subglottal pressure estimation in patients with voice disorders and in ambulatory voice recordings.
We previously developed an instrument called the Aerodynamic Vocal Fold Driver (AVFD) for intraoperative magnified assessment of vocal fold (VF) vibration during microlaryngoscopy under general anesthesia. Excised larynx testing showed that the AVFD could provide useful information about the vibratory characteristics of each VF independently. The present investigation expands those findings by testing new iterations of the AVFD during microlaryngoscopy in the canine model.
The AVFD is a handheld instrument that is positioned to contact the phonatory mucosa of either VF during microlaryngoscopy. Airflow delivered through the AVFD shaft to the subglottis drives the VF into phonation‐like vibration, which enables magnified observation of mucosal‐wave function with stroboscopy or high‐speed video. AVFD‐driven phonation was tested intraoperatively (n = 26 VFs) using either the original instrument design or smaller and larger versions three‐dimensionally printed from a medical grade polymer. A high‐fidelity pressure sensor embedded within the AVFD measured VF contact pressure. Characteristics of individual VF phonation were compared with typical two‐fold phonation and compared for VFs scarred by electrocautery (n = 4) versus controls (n = 22).
Phonation was successful in all 26 VFs, even when scar prevented conventional bilateral phonation. The 15‐mm‐wide AVFD fits best within the anteroposterior dimension of the musculo‐membranous VF, and VF contact pressure correlated with acoustic output, driving pressures, and visible modes of vibration.
The AVFD can reveal magnified vibratory characteristics of individual VFs during microlaryngoscopy (e.g., without needing patient participation), potentially providing information that is not apparent or available during conventional awake phonation, which might facilitate phonosurgical decision making.
The ability to provide absolute calibrated measurement of the laryngeal structures during phonation is of paramount importance to voice science and clinical practice. Calibrated three-dimensional measurement could provide essential information for modeling purposes, for studying the developmental aspects of vocal fold vibration, for refining functional voice assessment and treatment outcomes evaluation, and for more accurate staging and grading of laryngeal disease. Recently, a laser-calibrated transnasal fiberoptic endoscope compatible with high-speed videoendoscopy (HSV) and capable of providing three-dimensional measurements was developed. The optical principle employed is to project a grid of 7 × 7 green laser points across the field of view (FOV) at an angle relative to the imaging axis, such that (after calibration) the position of each laser point within the FOV encodes the vertical distance from the tip of the endoscope to the laryngeal tissues. The purpose of this study was to develop a precise method for vertical calibration of the endoscope. Investigating the position of the laser points showed that, besides the vertical distance, they also depend on the parameters of the lens coupler, including the FOV position within the image frame and the rotation angle of the endoscope. The presented automatic calibration method was developed to compensate for the effect of these parameters. Statistical image processing and pattern recognition were used to detect the FOV, the center of FOV, and the fiducial marker. This step normalizes the HSV frames to a standard coordinate system and removes the dependence of the laser-point positions on the parameters of the lens coupler. Then, using a statistical learning technique, a calibration protocol was developed to model the trajectories of all laser points as the working distance was varied. Finally, a set of experiments was conducted to measure the accuracy and reliability of every step of the procedure. The system was able to measure absolute vertical distance with mean percent error in the range of 1.7% to 4.7%, depending on the working distance.
A major limitation of comparing the efficacy of videostroboscopy (VS) and high-speed videoendoscopy (HSV) is the lack of an objective reference by which to compare the functional assessment ratings of the two techniques. For patients with vocal fold mass lesions, intraoperative measures of lesion size and depth may serve as this objective reference. This study compared the relationships between the pre- to postoperative change in VS and HSV visual-perceptual ratings to intraoperative measures of lesion size and depth.
Prospective visual-perceptual study with intraoperative measures of lesion size and depth.
VS and HSV samples were obtained preoperatively and postoperatively from 28 patients with vocal fold lesions and from 17 vocally healthy controls. Two experienced clinicians rated amplitude, mucosal wave, vertical phase difference, left-right phase asymmetry, and vocal fold edge on a visual-analog scale using both imaging techniques. The change in perioperative ratings from VS and HSV was compared between groups and correlated to intraoperative measures of lesion size and depth.
HSV was as reliable as VS for ratings of amplitude and edge, and substantially more reliable for ratings of mucosal wave and left-right phase asymmetry. Both VS and HSV had mild-moderate correlations between change in perioperative ratings and intraoperative measures of lesion area. Change in function could be obtained in more patients and for more parameters using HSV than VS. Group differences were noted for postoperative ratings of amplitude and edge; however, these differences were within one level of the visual-perceptual rating scale. The presence of asynchronicity in VS recordings renders vibratory features either uninterpretable or potentially distorted and thus should not be rated.
Amplitude and edge are robust vibratory measures for perioperative functional assessment, regardless of imaging modality. HSV is indicated for evaluation of subepithelial lesions or if asynchronicity is present in the VS image sequence.