OBJECTIVE: Relative fundamental frequency (RFF) has been suggested as a potential acoustic measure of vocal effort. However, current clinical standards for RFF measures require time-consuming manual markings. Previous semi-automated algorithms have been developed to calculate RFF from microphone signals. The current study aimed to develop fully automated algorithms to calculate RFF from neck-surface accelerometer signals for ecological momentary assessment and ambulatory monitoring of voice. METHODS: Training a set of 2646 /vowel-fricative-vowel/ utterances from 317 unique speakers, with and without voice disorders, was used to develop automated algorithms to calculate RFF values from neck-surface accelerometer signals. The algorithms first rejected utterances with poor vowel-to-noise ratios, then identified fricative locations, then used signal features to determine voicing boundary cycles, and finally calculated corresponding RFF values. These automated RFF values were compared to the clinical gold-standard of manual RFF calculated from simultaneously collected microphone signals in a novel test set of 639 utterances from 77 unique speakers. RESULTS: Automated accelerometer-based RFF values resulted in an average mean bias error (MBE) across all cycles of 0.027 ST, with an MBE of 0.152 ST and -0.252 ST in the offset and onset cycles closest to the fricative, respectively. CONCLUSION: All MBE values were smaller than the expected changes in RFF values following successful voice therapy, suggesting that the current algorithms could be used for ecological momentary assessment and ambulatory monitoring via neck-surface accelerometer signals.
OBJECTIVE: Singers undergoing tonsillectomy are understandably concerned about possible sequelae to their voice. The surgical risks of laryngeal damage from intubation and upper airway scarring are valid reasons for singers to carefully consider their options for treatment of tonsil-related symptoms. No prior studies have statistically assessed objective voice outcomes in a group of adult singers undergoing tonsillectomy. This study determined the impact of tonsillectomy on the adult singing voice by determining if there were statistically significant changes in preoperative versus postoperative acoustic, aerodynamic, and Voice-Related Quality of Life (VRQOL) measures. STUDY DESIGN: Prospective cohort study. SETTING: Tertiary Referral Academic Hospital SUBJECTS: Thirty singers undergoing tonsillectomy from 2012 to 2019. METHODS: Acoustic recordings were obtained with Computerized Speech Lab (CSL) (Pentax CSL 4500) and analyzed with the Multidimensional Voice Program (MDVP) (Pentax MDVP) and Pratt Acoustic Analysis Software. Estimates of aerodynamic vocal efficiency were obtained and analyzed using the Phonatory Aerodynamic System (Pentax PAS 6600). Preoperative VRQOL scores were recorded, and singers were instructed to refrain from singing for 3 weeks following tonsillectomy. Repeat acoustic and aerodynamic measures as well as VRQOL scores were obtained at the first postoperative visit. RESULTS: Average postoperative acoustic (jitter, shimmer, HNR) and aerodynamic (sound pressure level divided by subglottal pressure) parameters related to laryngeal phonatory function did not differ significantly from preoperative measures. The only statistically significant change in postoperative measures of resonance was a decrease in the 3rd formant (F3) for the /a/ vowel. Average postoperative VRQOL scores (79.8, SD18.7) improved significantly from preoperative VRQOL scores (89, SD12.2) (P = 0.007). CONCLUSIONS: Tonsillectomy does not appear to alter laryngeal voice production in adult singers as measured by standard acoustic and aerodynamic parameters. The observed decrease in F3 for the /a/ vowel is hypothetically related to increasing the pharyngeal cross-sectional area by removing tonsillar tissue, but this would not be expected to appreciably impact the perceptual characteristics of the vowel. Singers' self-assessment (VRQOL) improved after tonsillectomy.
The purpose of this study was to examine the psychometric properties of an ecological vocal effort scale linked to a voicing task.
Thirty-eight patients with nodules, 18 patients with muscle tension dysphonia, and 45 vocally healthy control individuals participated in a week of ambulatory voice monitoring. A global vocal status question was asked hourly throughout the day. Participants produced a vowel–consonant–vowel syllable string and rated the vocal effort needed to produce the task on a visual analog scale. Test–retest reliability was calculated for a subset using the intraclass correlation coefficient, ICC(A, 1). Construct validity was assessed by (a) comparing the weeklong vocal effort ratings between the patient and control groups and (b) comparing weeklong vocal effort ratings before and after voice rehabilitation in a subset of 25 patients. Cohen's d, the standard error of measurement (SEM), and the minimal detectable change (MDC) assessed sensitivity. The minimal clinically important difference (MCID) assessed responsiveness.
Test–retest reliability was excellent, ICC(A, 1) = .96. Weeklong mean effort was statistically higher in the patients than in controls (d = 1.62) and lower after voice rehabilitation (d = 1.75), supporting construct validity and sensitivity. SEM was 4.14, MDC was 11.47, and MCID was 9.74. Since the MCID was within the error of the measure, we must rely upon the MDC to detect real changes in ecological vocal effort.
The ecological vocal effort scale offers a reliable, valid, and sensitive method of monitoring vocal effort changes during the daily life of individuals with and without vocal hyperfunction.
Objectives: Singers, college students, and females are groups known to be at an elevated risk of developing functional/hyperfunctional voice disorders; therefore, female college students majoring in vocal performance may be at an even higher risk. To mitigate this risk, it would be helpful to know the "safe limits" for voice use that would help maintain vocal health in this vulnerable group, but there is a paucity of high-quality objective information upon which to base such limits. This study employed weeklong ambulatory voice monitoring in a large group of vocally healthy female college student singers to begin providing the types of objective data that could be used to help develop improved vocal health guidelines.
Methods: Participants included 64 vocally healthy females currently enrolled in a vocal performance or similar program at a college or university. An ambulatory voice monitor recorded neck-surface acceleration throughout a typical week. A singing classifier was applied to the data to separate singing from speech. Weeklong vocal dose measures and distributional characteristics for standard voice measures were computed separately for singing and speech, and for both types of phonation combined.
Results: Participants spent 6.2% of the total monitoring time speaking and 2.1% singing (with total phonation time being 8.4%). Singing had a higher fo mode, more pitch variability, higher average sound pressure level (SPL), negatively skewed SPL distributions, lower average CPP, and higher H1-H2 values than speaking.
Conclusions: These results provide a basis for beginning to establish vocal health guidelines for female students enrolled in college-level vocal performance programs and for future studies of the types of voice disorders that are common in this group. Results also demonstrate the potential value that ambulatory voice monitoring may have in helping to objectively identify vocal behaviors that could contribute to voice problems in this population.
Purpose The aim of this study was to use the Daily Phonotrauma Index (DPI) to quantify group-based changes in the daily voice use of patients with phonotraumatic vocal hyperfunction (PVH) after receiving voice therapy as the sole treatment. This is part of an ongoing effort to validate an updated theoretical framework for PVH. Method A custom-designed ambulatory voice monitor was used to collect 1 week of pre- and posttreatment data from 52 female patients with PVH. Normative weeklong data were also obtained from 52 matched controls. Each week was represented by the DPI, which is a combination of neck-surface acceleration magnitude skewness and the standard deviation of the difference between the first and second harmonic magnitudes. Results Compared to pretreatment, the DPI statistically decreased towards normal in the patient group after treatment (Cohen's d = -0.25). The posttreatment patient group's DPI was still significantly higher than the control group (d = 0.68). Conclusions The DPI showed the pattern of improved ambulatory voice use in a group of patients with PVH following voice therapy that was predicted by the updated theoretical framework. Per the prediction, voice therapy was associated with a decreased potential for phonotrauma in daily voice use, but the posttreatment patient group data were still significantly different from the normative control group data. This posttreatment difference is interpreted as reflecting the impact on voice use of the persistence of phonotrauma-induced structural changes to the vocal folds. Further validation of the DPI is needed to better understand its potential clinical use.
Purpose Previous ambulatory voice monitoring studies have included many singers and have combined speech and singing in the analyses. This study applied a singing classifier to the ambulatory recordings of singers with phonotrauma and healthy controls to determine if analyzing speech and singing separately would reveal voice use differences that could provide new insights into the etiology and pathophysiology of phonotrauma in this at-risk population. Method Forty-two female singers with phonotrauma (vocal fold nodules or polyps) and 42 healthy matched controls were monitored using an ambulatory voice monitor. Weeklong statistics (average, standard deviation, skewness, kurtosis) for sound pressure level (SPL), fundamental frequency, cepstral peak prominence, the magnitude ratio of the first two harmonics (H1-H2 ), and three vocal dose measures were computed from the neck surface acceleration signal and separated into singing and speech using a singing classifier. Results Mixed analysis of variance models found expected differences between singing and speech in each voice parameter, except SPL kurtosis. SPL skewness, SPL kurtosis, and all H1-H2 distributional parameters differentiated patients and controls when singing and speech were combined. Interaction effects were found in H1-H2 kurtosis and all vocal dose measures. Patients had significantly higher vocal doses in speech compared to controls. Conclusions Consistent with prior work, the pathophysiology of phonotrauma in singers is characterized by more abrupt/complete glottal closure (decreased mean and variation for H1-H2 ) and increased laryngeal forces (negatively skewed SPL distribution) during phonation. Application of a singing classifier to weeklong data revealed that singers with phonotrauma spent more time speaking on a weekly basis, but not more time singing, compared to controls. Results are used as a basis for hypothesizing about the role of speaking voice in the etiology of phonotraumatic vocal hyperfunction in singers.
Purpose The purpose of this study was to obtain a more comprehensive understanding of the pathophysiology and impact on daily voice use of nonphonotraumatic vocal hyperfunction (NPVH). Method An ambulatory voice monitor collected 1 week of data from 36 patients with NPVH and 36 vocally healthy matched controls. A subset of 11 patients with NPVH were monitored after voice therapy. Daily voice use measures included neck-skin acceleration magnitude, fundamental frequency (f (o)), cepstral peak prominence (CPP), and the difference between the first and second harmonic magnitudes (H1-H2). Additional comparisons included 118 patients with phonotraumatic vocal hyperfunction (PVH) and 89 additional vocally healthy controls. Results The NPVH group, compared to the matched control group, exhibited increased f (o) (Cohen's d = 0.6), reduced CPP (d = -0.9), and less positive H1-H2 skewness (d = -1.1). Classifiers used CPP mean and H1-H2 mode to maximally differentiate the NPVH and matched control groups (area under the receiver operating characteristic curve of 0.78). Classifiers performed well on unseen data: the logit decreased in patients with NPVH after therapy; ≥ 85% of the control and PVH groups were identified as "normal" or "not NPVH," respectively. Conclusions The NPVH group's daily voice use is less periodic (CPP), is higher pitched (f (o)), and has less abrupt vocal fold closure (H1-H2 skew) compared to the matched control group. The combination of CPP mean and H1-H2 mode appears to reflect a pathophysiological continuum in NPVH patients of inefficient phonation with minimal potential for phonotrauma. Further validation of the classification model is needed to better understand potential clinical uses. Supplemental Material https://doi.org/10.23641/asha.14390771.
The purpose of this paper is to report on the first in vivo application of a recently developed transoral, dual-sensor pressure probe that directly measures intraglottal, subglottal, and vocal fold collision pressures during phonation. Synchronous measurement of intraglottal and subglottal pressures was accomplished using two miniature pressure sensors mounted on the end of the probe and inserted transorally in a 78-year-old male who had previously undergone surgical removal of his right vocal fold for treatment of laryngeal cancer. The endoscopist used one hand to position the custom probe against the surgically medialized scar band that replaced the right vocal fold and used the other hand to position a transoral endoscope to record laryngeal high-speed videoendoscopy of the vibrating left vocal fold contacting the pressure probe. Visualization of the larynx during sustained phonation allowed the endoscopist to place the dual-sensor pressure probe such that the proximal sensor was positioned intraglottally and the distal sensor subglottally. The proximal pressure sensor was verified to be in the strike zone of vocal fold collision during phonation when the intraglottal pressure signal exhibited three characteristics: an impulsive peak at the start of the closed phase, a rounded peak during the open phase, and a minimum value around zero immediately preceding the impulsive peak of the subsequent phonatory cycle. Numerical voice production modeling was applied to validate model-based predictions of vocal fold collision pressure using kinematic vocal fold measures. The results successfully demonstrated feasibility of in vivo measurement of vocal fold collision pressure in an individual with a hemilaryngectomy, motivating ongoing data collection that is designed to aid in the development of vocal dose measures that incorporate vocal fold impact collision and stresses.
The ambulatory assessment of vocal function can be significantly enhanced by having access to physiologically based features that describe underlying pathophysiological mechanisms in individuals with voice disorders. This type of enhancement can improve methods for the prevention, diagnosis, and treatment of behaviorally based voice disorders. Unfortunately, the direct measurement of important vocal features such as subglottal pressure, vocal fold collision pressure, and laryngeal muscle activation is impractical in laboratory and ambulatory settings. In this study, we introduce a method to estimate these features during phonation from a neck-surface vibration signal through a framework that integrates a physiologically relevant model of voice production and machine learning tools. The signal from a neck-surface accelerometer is first processed using subglottal impedance-based inverse filtering to yield an estimate of the unsteady glottal airflow. Seven aerodynamic and acoustic features are extracted from the neck surface accelerometer and an optional microphone signal. A neural network architecture is selected to provide a mapping between the seven input features and subglottal pressure, vocal fold collision pressure, and cricothyroid and thyroarytenoid muscle activation. This non-linear mapping is trained solely with 13,000 Monte Carlo simulations of a voice production model that utilizes a symmetric triangular body-cover model of the vocal folds. The performance of the method was compared against laboratory data from synchronous recordings of oral airflow, intraoral pressure, microphone, and neck-surface vibration in 79 vocally healthy female participants uttering consecutive /pæ/ syllable strings at comfortable, loud, and soft levels. The mean absolute error and root-mean-square error for estimating the mean subglottal pressure were 191 Pa (1.95 cm H(2)O) and 243 Pa (2.48 cm H(2)O), respectively, which are comparable with previous studies but with the key advantage of not requiring subject-specific training and yielding more output measures. The validation of vocal fold collision pressure and laryngeal muscle activation was performed with synthetic values as reference. These initial results provide valuable insight for further vocal fold model refinement and constitute a proof of concept that the proposed machine learning method is a feasible option for providing physiologically relevant measures for laboratory and ambulatory assessment of vocal function.
Objective: Calibrated horizontal measurements (e.g., mm) from endoscopic procedures could be utilized for advancement of evidence-based practice and personalized medicine. However, the size of an object in endoscopic images is not readily calibrated and depends on multiple factors, including the distance between the endoscope and the target surface. Additionally, acquired images may have significant non-linear distortion that would further complicate calibrated measurements. This study used a recently developed in-vivo laser-projection fiberoptic laryngoscope and proposes a method for calibrated spatial measurements.
Method: A set of circular grids were recorded at multiple working distances. A statistical model was trained that would map from pixel length of the object, the working distance, and the spatial location of the target object into its mm length.
Result: A detailed analysis of the performance of the proposed method is presented. The analyses have shown that the accuracy of the proposed method does not depend on the working distance and length of the target object. The estimated average magnitude of error was 0.27 mm, which is three times lower than the existing alternative.
Conclusion: The presented method can achieve sub-millimeter accuracy in horizontal measurement.
Significance: Evidence-based practice and personalized medicine could significantly benefit from the proposed method. Implications of the findings for other endoscopic procedures are also discussed.
Cepstrum-based voice measures, such as smoothed cepstral peak prominence (CPPS), are influenced by voice sound pressure level (SPL) in vocally healthy adults. Since it is unclear if similar effects hold in voice disordered adults and how these interact with natural fundamental frequency (fo) changes, this study examines voice SPL and fo effects on CPPS in women with vocal hyperfunction and vocally healthy controls.
Retrospective matched case-control study.
Fifty-eight women with vocal hyperfunction were individually matched with 58 vocally healthy women for occupation and approximate age. The patient group comprised women exhibiting phonotraumatic vocal hyperfunction associated with vocal fold nodules (n = 39) or polyps (n = 5), and nonphonotraumatic vocal hyperfunction associated with primary muscle tension dysphonia (n = 14). All participants sustained the vowel /a/ at soft, comfortable, and loud loudness conditions. Voice SPL, fo, and CPPS (dB) were computed from acoustic voice recordings using Praat. The effects of loudness condition, measured voice SPL, and fo on CPPS were assessed with linear mixed models. Pairwise correlations among voice SPL, fo, and CPPS were assessed using multiple regression analysis.
Increasing voice SPL correlated significantly (P < 0.001) with higher CPPS in both patient (r2 = 0.53) and normative groups (r2 = 0.45). fo had statistically significant effects on CPPS (P < 0.001), but with a weak relation for the patient (r2 = 0.02) and control groups (r2 = 0.05).
In women with and without voice disorder, CPPS is highly affected by the individual's voice SPL in vowel phonation. Future studies could investigate how these effects should be controlled for to improve the diagnostic value of acoustic-based cepstral measures.
Purpose This study attempts to gain insights into the role of daily voice use in the etiology and pathophysiology of phonotraumatic vocal hyperfunction (PVH) by applying a logistic regression-based daily phonotrauma index (DPI) to predict group-based improvements in patients with PVH after laryngeal surgery and/or postsurgical voice therapy. Method A custom-designed ambulatory voice monitor was used to collect 1 week of pre- and postsurgery data from 27 female patients with PVH; 13 of these patients were also monitored after postsurgical voice therapy. Normative weeklong data were obtained from 27 matched controls. Each week was represented by the DPI, standard deviation of the difference between the first and second harmonic amplitudes (H1-H2). Results Compared to pretreatment, the DPI significantly decreased in the patient group after surgery (Cohen's d effect size = -0.86) and voice therapy (d = -1.06). The patient group DPI only normalized after voice therapy. Conclusions The DPI produced the expected pattern of improved ambulatory voice use across laryngeal surgery and postsurgical voice therapy in a group of patients with PVH. The results were interpreted as providing new objective information about the role of daily voice use in the etiology and pathophysiology of PVH. The DPI is viewed as an estimate of potential vocal fold trauma that relies on combining the long-term distributional characteristics of two parameters representing the magnitude of phonatory forces (neck-surface acceleration magnitude) and vocal fold closure dynamics (H1-H2). Further validation of the DPI is needed to better understand its potential clinical use.
Purpose The purpose of this study was to determine whether estimates of glottal aerodynamic measures based on neck-surface vibration are comparable to those previously obtained using oral airflow and air pressure signals (Espinoza et al., 2017) in terms of discriminating patients with phonotraumatic and nonphonotraumatic vocal hyperfunction (PVH and NPVH) from vocally healthy controls. Method Consecutive /pae/ syllables at comfortable and loud level were produced by 16 women with PVH (organic vocal fold lesions), 16 women with NPVH (primary muscle tension dysphonia), and 32 vocally healthy women who were each matched to a patient according to age and occupation. Subglottal impedance-based inverse filtering of the anterior neck-surface accelerometer (ACC) signal yielded estimates of peak-to-peak glottal airflow, open quotient, and maximum flow declination rate. Average subglottal pressure and microphone-based sound pressure level (SPL) were also estimated from the ACC signal using subject-specific linear regression models. The ACC-based measures of glottal aerodynamics were normalized for SPL and statistically compared between each patient and matched-control group. Results Patients with PVH and NPVH exhibited lower SPL-normalized glottal aerodynamics values than their respective control subjects (p values ranging from < .01 to .07) with very large effect sizes (1.04-2.16), regardless of loudness condition or measurement method (i.e., ACC-based values maintained discriminatory power). Conclusions The results of this study demonstrate that ACC-based estimates of most glottal aerodynamic measures are comparable to those previously obtained from oral airflow and air pressure (Espinoza et al., 2017) in terms of differentiating between hyperfunctional (PVH and NPVH) and normal vocal function. ACC-based estimates of glottal aerodynamic measures may be used to assess vocal function during continuous speech and enables this assessment of daily voice use during ambulatory monitoring to provide better insight into the pathophysiological mechanisms associated with vocal hyperfunction.
Purpose The purpose of this viewpoint article is to facilitate research on vocal hyperfunction (VH). VH is implicated in the most commonly occurring types of voice disorders, but there remains a pressing need to increase our understanding of the etiological and pathophysiological mechanisms associated with VH to improve the prevention, diagnosis, and treatment of VH-related disorders. Method A comprehensive theoretical framework for VH is proposed based on an integration of prevailing clinical views and research evidence. Results The fundamental structure of the current framework is based on a previous (simplified) version that was published over 30 years ago (Hillman et al., 1989). A central premise of the framework is that there are two primary manifestations of VH-phonotraumatic VH and nonphonotraumatic VH-and that multiple factors contribute and interact in different ways to cause and maintain these two types of VH. Key hypotheses are presented about the way different factors may contribute to phonotraumatic VH and nonphonotraumatic VH and how the associated disorders may respond to treatment. Conclusions This updated and expanded framework is meant to help guide future research, particularly the design of longitudinal studies, which can lead to a refinement in knowledge about the etiology and pathophysiology of VH-related disorders. Such new knowledge should lead to further refinements in the framework and serve as a basis for improving the prevention and evidence-based clinical management of VH.
Speakers typically modify their voice in the presence of increased background noise levels, exhibiting the classic Lombard effect. Lombard-related characteristics during everyday activities were recorded from 17 vocally healthy women who wore an acoustic noise dosimeter and ambulatory voice monitor. The linear relationship between vocal sound pressure level and environmental noise level exhibited an average slope of 0.54 dB/dB and value of 72.8 dB SPL at 50 dBA when correlation coefficients were greater than 0.4. These results, coupled with analyses of spectral and cepstral vocal function measures, provide normative ambulatory Lombard characteristics for comparison with patients with voice-use related disorders.
Modern operational environments can place significant demands on a service member's cognitive resources, increasing the risk of errors or mishaps due to overburden. The ability to monitor cognitive burden and associated performance within operational environments is critical to improving mission readiness. As a key step toward a field-ready system, we developed a simulated marksmanship scenario with an embedded working memory task in an immersive virtual reality environment. As participants performed the marksmanship task, they were instructed to remember numbered targets and recall the sequence of those targets at the end of the trial. Low and high cognitive load conditions were defined as the recall of three- and six-digit strings, respectively. Physiological and behavioral signals recorded included speech, heart rate, breathing rate, and body movement. These features were input into a random forest classifier that significantly discriminated between the low- and high-cognitive load conditions (AUC = 0.94). Behavioral features of gait were the most informative, followed by features of speech. We also showed the capability to predict performance on the digit recall (AUC = 0.71) and marksmanship (AUC = 0.58) tasks. The experimental framework can be leveraged in future studies to quantify the interaction of other types of stressors and their impact on operational cognitive and physical performance.
The goal of this study was to employ frequently used analysis methods and tasks to identify values for cepstral peak prominence (CPP) that can aid clinical voice evaluation. Experiment 1 identified CPP values to distinguish speakers with and without voice disorders. Experiment 2 was an initial attempt to estimate auditory-perceptual ratings of overall dysphonia severity using CPP values.
CPP was computed using the Analysis of Dysphonia in Speech and Voice (ADSV) program and Praat. Experiment 1 included recordings from 295 patients with medically diagnosed voice disorders and 50 vocally healthy control speakers. Speakers produced sustained /a/ vowels and the English language Rainbow Passage. CPP cutoff values that best distinguished patient and control speakers were identified. Experiment 2 analyzed recordings from 32 English speakers with varying dysphonia severity and provided preliminary validation of the Experiment 1 cutoffs. Speakers sustained the /a/ vowel and read four sentences from the Consensus Auditory-Perceptual Evaluation of Voice protocol. Trained listeners provided auditory-perceptual ratings of overall dysphonia for the recordings, which were estimated using CPP values in a linear regression model whose performance was evaluated using the coefficient of determination (r2).
Experiment 1 identified CPP cutoff values of 11.46 dB (ADSV) and 14.45 dB (Praat) for the sustained /a/ vowels and 6.11 dB (ADSV) and 9.33 dB (Praat) for the Rainbow Passage. CPP values below those thresholds indicated the presence of a voice disorder with up to 94.5% accuracy. In Experiment 2, CPP values estimated ratings of overall dysphonia with r2 values up to .74.
The CPP cutoff values identified in Experiment 1 provide normative reference points for clinical voice evaluation based on sustained /a/ vowels and the Rainbow Passage. Experiment 2 provides an initial predictive framework that can be used to relate CPP values to the auditory perception of overall dysphonia severity based on sustained /a/ vowels and Consensus Auditory-Perceptual Evaluation of Voice sentences.