The purpose of this study was to examine the psychometric properties of an ecological vocal effort scale linked to a voicing task.
Thirty-eight patients with nodules, 18 patients with muscle tension dysphonia, and 45 vocally healthy control individuals participated in a week of ambulatory voice monitoring. A global vocal status question was asked hourly throughout the day. Participants produced a vowel–consonant–vowel syllable string and rated the vocal effort needed to produce the task on a visual analog scale. Test–retest reliability was calculated for a subset using the intraclass correlation coefficient, ICC(A, 1). Construct validity was assessed by (a) comparing the weeklong vocal effort ratings between the patient and control groups and (b) comparing weeklong vocal effort ratings before and after voice rehabilitation in a subset of 25 patients. Cohen's d, the standard error of measurement (SEM), and the minimal detectable change (MDC) assessed sensitivity. The minimal clinically important difference (MCID) assessed responsiveness.
Test–retest reliability was excellent, ICC(A, 1) = .96. Weeklong mean effort was statistically higher in the patients than in controls (d = 1.62) and lower after voice rehabilitation (d = 1.75), supporting construct validity and sensitivity. SEM was 4.14, MDC was 11.47, and MCID was 9.74. Since the MCID was within the error of the measure, we must rely upon the MDC to detect real changes in ecological vocal effort.
The ecological vocal effort scale offers a reliable, valid, and sensitive method of monitoring vocal effort changes during the daily life of individuals with and without vocal hyperfunction.
Objectives: Singers, college students, and females are groups known to be at an elevated risk of developing functional/hyperfunctional voice disorders; therefore, female college students majoring in vocal performance may be at an even higher risk. To mitigate this risk, it would be helpful to know the "safe limits" for voice use that would help maintain vocal health in this vulnerable group, but there is a paucity of high-quality objective information upon which to base such limits. This study employed weeklong ambulatory voice monitoring in a large group of vocally healthy female college student singers to begin providing the types of objective data that could be used to help develop improved vocal health guidelines.
Methods: Participants included 64 vocally healthy females currently enrolled in a vocal performance or similar program at a college or university. An ambulatory voice monitor recorded neck-surface acceleration throughout a typical week. A singing classifier was applied to the data to separate singing from speech. Weeklong vocal dose measures and distributional characteristics for standard voice measures were computed separately for singing and speech, and for both types of phonation combined.
Results: Participants spent 6.2% of the total monitoring time speaking and 2.1% singing (with total phonation time being 8.4%). Singing had a higher fo mode, more pitch variability, higher average sound pressure level (SPL), negatively skewed SPL distributions, lower average CPP, and higher H1-H2 values than speaking.
Conclusions: These results provide a basis for beginning to establish vocal health guidelines for female students enrolled in college-level vocal performance programs and for future studies of the types of voice disorders that are common in this group. Results also demonstrate the potential value that ambulatory voice monitoring may have in helping to objectively identify vocal behaviors that could contribute to voice problems in this population.
Purpose The aim of this study was to use the Daily Phonotrauma Index (DPI) to quantify group-based changes in the daily voice use of patients with phonotraumatic vocal hyperfunction (PVH) after receiving voice therapy as the sole treatment. This is part of an ongoing effort to validate an updated theoretical framework for PVH. Method A custom-designed ambulatory voice monitor was used to collect 1 week of pre- and posttreatment data from 52 female patients with PVH. Normative weeklong data were also obtained from 52 matched controls. Each week was represented by the DPI, which is a combination of neck-surface acceleration magnitude skewness and the standard deviation of the difference between the first and second harmonic magnitudes. Results Compared to pretreatment, the DPI statistically decreased towards normal in the patient group after treatment (Cohen's d = -0.25). The posttreatment patient group's DPI was still significantly higher than the control group (d = 0.68). Conclusions The DPI showed the pattern of improved ambulatory voice use in a group of patients with PVH following voice therapy that was predicted by the updated theoretical framework. Per the prediction, voice therapy was associated with a decreased potential for phonotrauma in daily voice use, but the posttreatment patient group data were still significantly different from the normative control group data. This posttreatment difference is interpreted as reflecting the impact on voice use of the persistence of phonotrauma-induced structural changes to the vocal folds. Further validation of the DPI is needed to better understand its potential clinical use.
Purpose Previous ambulatory voice monitoring studies have included many singers and have combined speech and singing in the analyses. This study applied a singing classifier to the ambulatory recordings of singers with phonotrauma and healthy controls to determine if analyzing speech and singing separately would reveal voice use differences that could provide new insights into the etiology and pathophysiology of phonotrauma in this at-risk population. Method Forty-two female singers with phonotrauma (vocal fold nodules or polyps) and 42 healthy matched controls were monitored using an ambulatory voice monitor. Weeklong statistics (average, standard deviation, skewness, kurtosis) for sound pressure level (SPL), fundamental frequency, cepstral peak prominence, the magnitude ratio of the first two harmonics (H1-H2 ), and three vocal dose measures were computed from the neck surface acceleration signal and separated into singing and speech using a singing classifier. Results Mixed analysis of variance models found expected differences between singing and speech in each voice parameter, except SPL kurtosis. SPL skewness, SPL kurtosis, and all H1-H2 distributional parameters differentiated patients and controls when singing and speech were combined. Interaction effects were found in H1-H2 kurtosis and all vocal dose measures. Patients had significantly higher vocal doses in speech compared to controls. Conclusions Consistent with prior work, the pathophysiology of phonotrauma in singers is characterized by more abrupt/complete glottal closure (decreased mean and variation for H1-H2 ) and increased laryngeal forces (negatively skewed SPL distribution) during phonation. Application of a singing classifier to weeklong data revealed that singers with phonotrauma spent more time speaking on a weekly basis, but not more time singing, compared to controls. Results are used as a basis for hypothesizing about the role of speaking voice in the etiology of phonotraumatic vocal hyperfunction in singers.
Purpose The purpose of this study was to obtain a more comprehensive understanding of the pathophysiology and impact on daily voice use of nonphonotraumatic vocal hyperfunction (NPVH). Method An ambulatory voice monitor collected 1 week of data from 36 patients with NPVH and 36 vocally healthy matched controls. A subset of 11 patients with NPVH were monitored after voice therapy. Daily voice use measures included neck-skin acceleration magnitude, fundamental frequency (f (o)), cepstral peak prominence (CPP), and the difference between the first and second harmonic magnitudes (H1-H2). Additional comparisons included 118 patients with phonotraumatic vocal hyperfunction (PVH) and 89 additional vocally healthy controls. Results The NPVH group, compared to the matched control group, exhibited increased f (o) (Cohen's d = 0.6), reduced CPP (d = -0.9), and less positive H1-H2 skewness (d = -1.1). Classifiers used CPP mean and H1-H2 mode to maximally differentiate the NPVH and matched control groups (area under the receiver operating characteristic curve of 0.78). Classifiers performed well on unseen data: the logit decreased in patients with NPVH after therapy; ≥ 85% of the control and PVH groups were identified as "normal" or "not NPVH," respectively. Conclusions The NPVH group's daily voice use is less periodic (CPP), is higher pitched (f (o)), and has less abrupt vocal fold closure (H1-H2 skew) compared to the matched control group. The combination of CPP mean and H1-H2 mode appears to reflect a pathophysiological continuum in NPVH patients of inefficient phonation with minimal potential for phonotrauma. Further validation of the classification model is needed to better understand potential clinical uses. Supplemental Material https://doi.org/10.23641/asha.14390771.
The ambulatory assessment of vocal function can be significantly enhanced by having access to physiologically based features that describe underlying pathophysiological mechanisms in individuals with voice disorders. This type of enhancement can improve methods for the prevention, diagnosis, and treatment of behaviorally based voice disorders. Unfortunately, the direct measurement of important vocal features such as subglottal pressure, vocal fold collision pressure, and laryngeal muscle activation is impractical in laboratory and ambulatory settings. In this study, we introduce a method to estimate these features during phonation from a neck-surface vibration signal through a framework that integrates a physiologically relevant model of voice production and machine learning tools. The signal from a neck-surface accelerometer is first processed using subglottal impedance-based inverse filtering to yield an estimate of the unsteady glottal airflow. Seven aerodynamic and acoustic features are extracted from the neck surface accelerometer and an optional microphone signal. A neural network architecture is selected to provide a mapping between the seven input features and subglottal pressure, vocal fold collision pressure, and cricothyroid and thyroarytenoid muscle activation. This non-linear mapping is trained solely with 13,000 Monte Carlo simulations of a voice production model that utilizes a symmetric triangular body-cover model of the vocal folds. The performance of the method was compared against laboratory data from synchronous recordings of oral airflow, intraoral pressure, microphone, and neck-surface vibration in 79 vocally healthy female participants uttering consecutive /pæ/ syllable strings at comfortable, loud, and soft levels. The mean absolute error and root-mean-square error for estimating the mean subglottal pressure were 191 Pa (1.95 cm H(2)O) and 243 Pa (2.48 cm H(2)O), respectively, which are comparable with previous studies but with the key advantage of not requiring subject-specific training and yielding more output measures. The validation of vocal fold collision pressure and laryngeal muscle activation was performed with synthetic values as reference. These initial results provide valuable insight for further vocal fold model refinement and constitute a proof of concept that the proposed machine learning method is a feasible option for providing physiologically relevant measures for laboratory and ambulatory assessment of vocal function.
Purpose This study attempts to gain insights into the role of daily voice use in the etiology and pathophysiology of phonotraumatic vocal hyperfunction (PVH) by applying a logistic regression-based daily phonotrauma index (DPI) to predict group-based improvements in patients with PVH after laryngeal surgery and/or postsurgical voice therapy. Method A custom-designed ambulatory voice monitor was used to collect 1 week of pre- and postsurgery data from 27 female patients with PVH; 13 of these patients were also monitored after postsurgical voice therapy. Normative weeklong data were obtained from 27 matched controls. Each week was represented by the DPI, standard deviation of the difference between the first and second harmonic amplitudes (H1-H2). Results Compared to pretreatment, the DPI significantly decreased in the patient group after surgery (Cohen's d effect size = -0.86) and voice therapy (d = -1.06). The patient group DPI only normalized after voice therapy. Conclusions The DPI produced the expected pattern of improved ambulatory voice use across laryngeal surgery and postsurgical voice therapy in a group of patients with PVH. The results were interpreted as providing new objective information about the role of daily voice use in the etiology and pathophysiology of PVH. The DPI is viewed as an estimate of potential vocal fold trauma that relies on combining the long-term distributional characteristics of two parameters representing the magnitude of phonatory forces (neck-surface acceleration magnitude) and vocal fold closure dynamics (H1-H2). Further validation of the DPI is needed to better understand its potential clinical use.
Speakers typically modify their voice in the presence of increased background noise levels, exhibiting the classic Lombard effect. Lombard-related characteristics during everyday activities were recorded from 17 vocally healthy women who wore an acoustic noise dosimeter and ambulatory voice monitor. The linear relationship between vocal sound pressure level and environmental noise level exhibited an average slope of 0.54 dB/dB and value of 72.8 dB SPL at 50 dBA when correlation coefficients were greater than 0.4. These results, coupled with analyses of spectral and cepstral vocal function measures, provide normative ambulatory Lombard characteristics for comparison with patients with voice-use related disorders.
Given the established linear relationship between neck surface vibration magnitude and mean subglottal pressure (Ps) in vocally healthy speakers, the purpose of this study was to better understand the impact of the presence of a voice disorder on this baseline relationship.
Data were obtained from participants with voice disorders representing a variety of glottal conditions, including phonotraumatic vocal hyperfunction, nonphonotraumatic vocal hyperfunction, and unilateral vocal fold paralysis. Participants were asked to repeat /p/-vowel syllable strings from loud-to-soft loudness levels in multiple vowel contexts (/pa/, /pi/, /pu/) and pitch levels (comfortable, higher than comfortable, lower than comfortable). Three statistical metrics were computed to analyze the regression line between neck surface accelerometer (ACC) signal magnitude and Ps within and across pitch, vowel, and voice disorder category: coefficient of determination (r2), slope, and intercept. Three linear mixed-effects models were used to evaluate the impact of voice disorder category, pitch level, and vowel context on the relationship between ACC signal magnitude and Ps.
The relationship between ACC signal magnitude and Ps was statistically different in patients with voice disorders than in vocally healthy controls; patients exhibited higher levels of Ps given similar values of ACC signal magnitude. Negligible effects were found for pitch condition within each voice disorder category, and negligible-to-small effects were found for vowel context. The mean of patient-specific r2 values was .63, ranging from .13 to .92.
The baseline, linear relationship between ACC signal magnitude and Ps is affected by the presence of a voice disorder, with the relationship being participant-specific. Further work is needed to improve ACC-based prediction of Ps, across treatment, and during naturalistic speech production.
Previous work using ambulatory voice recordings has shown no differences in average vocal behavior between patients with phonotraumatic vocal hyperfunction and matched controls. This study used larger groups to replicate these results and expanded the analysis to include distributional characteristics of ambulatory voice use and measures indicative of glottal closure.
Subjects included 180 adult women: 90 diagnosed with vocal fold nodules or polyps and 90 age-, sex-, and occupation-matched controls with no history of voice disorders. Weeklong summary statistics (average, variability, skewness, kurtosis) of voice use were computed from neck-surface acceleration recorded using an ambulatory voice monitor. Voice measures included estimates of sound pressure level (SPL), fundamental frequency (fo), cepstral peak prominence, and the difference between the first and second harmonic magnitudes (H1–H2).
Statistical comparisons resulted in medium–large differences (Cohen's d ≥ 0.5) between groups for SPL skewness, fo variability, and H1–H2 variability. Two logistic regressions (theory-based and stepwise) found SPL skewness and H1–H2 variability to classify patients and controls based on their weekly voice data, with an area under the receiver operating characteristic curve of 0.85 and 0.82 on training and test sets, respectively.
Compared to controls, the weekly voice use of patients with phonotraumatic vocal hyperfunction reflected higher SPL tendencies (negatively skewed SPL) with more abrupt glottal closure (reduced H1–H2 variability, especially toward higher values). Further work could examine posttreatment data (e.g., after surgery and/or therapy) to determine the extent to which these differences are associated with the etiology and pathophysiology of phonotraumatic vocal fold lesions.
Subglottal air pressure plays a major role in voice production and is a primary factor in controlling voice onset, offset, sound pressure level, glottal airflow, vocal fold collision pressures, and variations in fundamental frequency. Previous work has shown promise for the estimation of subglottal pressure from an unobtrusive miniature accelerometer sensor attached to the anterior base of the neck during typical modal voice production across multiple pitch and vowel contexts. This study expands on that work to incorporate additional accelerometer-based measures of vocal function to compensate for non-modal phonation characteristics and achieve an improved estimation of subglottal pressure. Subjects with normal voices repeated /p/-vowel syllable strings from loud-to-soft levels in multiple vowel contexts (/a/, /i/, and /u/), pitch conditions (comfortable, lower than comfortable, higher than comfortable), and voice quality types (modal, breathy, strained, and rough). Subject-specific, stepwise regression models were constructed using root-mean-square (RMS) values of the accelerometer signal alone (baseline condition) and in combination with cepstral peak prominence, fundamental frequency, and glottal airflow measures derived using subglottal impedance-based inverse filtering. Five-fold cross-validation assessed the robustness of model performance using the root-mean-square error metric for each regression model. Each cross-validation fold exhibited up to a 25% decrease in prediction error when the model incorporated multi-dimensional aspects of the accelerometer signal compared with RMS-only models. Improved estimation of subglottal pressure for non-modal phonation was thus achievable, lending to future studies of subglottal pressure estimation in patients with voice disorders and in ambulatory voice recordings.
The purpose of this study was to evaluate the effects of nonmodal phonation on estimates of subglottal pressure (Ps) derived from the magnitude of a neck-surface accelerometer (ACC) signal and to confirm previous findings regarding the impact of vowel contexts and pitch levels in a larger cohort of participants.
Twenty-six vocally healthy participants (18 women, 8 men) were asked to produce a series of p-vowel syllables with descending loudness in 3 vowel contexts (/a/, /i/, and /u/), 3 pitch levels (comfortable, high, and low), and 4 elicited phonatory conditions (modal, breathy, strained, and rough). Estimates of Ps for each vowel segment were obtained by averaging the intraoral air pressure plateau before and after each segment. The root-mean-square magnitude of the neck-surface ACC signal was computed for each vowel segment. Three linear mixed-effects models were used to statistically assess the effects of vowel, pitch, and phonatory condition on the linear relationship (slope and intercept) between Ps and ACC signal magnitude.
Results demonstrated statistically significant linear relationships between ACC signal magnitude and Ps within participants but with increased intercepts for the nonmodal phonatory conditions; slopes were affected to a lesser extent. Vowel and pitch contexts did not significantly affect the linear relationship between ACC signal magnitude and Ps.
The classic linear relationship between ACC signal magnitude and Ps is significantly affected when nonmodal phonation is produced by a speaker. Future work is warranted to further characterize nonmodal phonatory characteristics to improve the ACC-based prediction of Ps during naturalistic speech production.
Ambulatory voice monitoring is a promising tool for investigating phonotraumatic vocal hyperfunction (PVH), associated with the development of vocal fold lesions. Since many patients with PVH are professional vocalists, a classifier was developed to better understand phonatory mechanisms during speech and singing. Twenty singers with PVH and 20 matched healthy controls were monitored with a neck-surface accelerometer–based ambulatory voice monitor. An expert-labeled ground truth data set was used to train a logistic regression on 15 subject-pairs with fundamental frequency and autocorrelation peak amplitude as input features. Overall classification accuracy of 94.2% was achieved on the held-out test set.
Miniature high-bandwidth accelerometers on the anterior neck surface are used in laboratory and ambulatory settings to obtain vocal function measures. This study compared the widely applied L1–L2 measure (historically, H1–H2)—the difference between the log-magnitude of the first and second harmonics—computed from the glottal airflow waveform with L1–L2 derived from the raw neck-surface acceleration signal in 79 vocally healthy female speakers. Results showed a significant correlation (r = 0.72) between L1–L2 values estimated from both airflow and accelerometer signals, suggesting that raw accelerometer-based estimates of L1–L2 may be interpreted as reflecting glottal physiological parameters and voice quality attributes during phonation.
Phonotraumatic vocal hyperfunction (PVH) is associated with chronic misuse and/or abuse of voice that can result in lesions such as vocalfold nodules. The clinical aerodynamic assessment of vocal function has been recently shown to differentiate between patients with PVH and healthy controls to provide meaningful insight into pathophysiological mechanisms associated with these disorders. However, all current clinical assessment of PVH is incomplete because of its inability to objectively identify the type and extent of detrimental phonatory function that is associated with PVH during daily voice use. The current study sought to address this issue by incorporating, for the first time in a comprehensive ambulatory assessment, glottal airflow parameters estimated from a neck-mounted accelerometer and recorded to a smartphone-based voice monitor. We tested this approach on 48 patients with vocal fold nodules and 48 matched healthy-control subjects who each wore the voice monitor for a week. Seven glottal airflow features were estimated every 50 ms using an impedance-based inverse filtering scheme, and seven high-order summary statistics of each feature were computed every 5 minutes over voiced segments. Based on a univariate hypothesis testing, eight glottal airflow summary statistics were found to be statistically different between patient and healthy-control groups. L1-regularized logistic regression for a supervised classification task yielded a mean (standard deviation) area under the ROC curve of 0.82 (0.25) and an accuracy of 0.83 (0.14). These results outperform the state-of-the-art classification for the same classification task and provide a new avenue to improve the assessment and treatment of hyperfunctional voice disorders.
The aim of this study was to establish reliability and validity for self-ratings of vocal status obtained during the daily activities of patients with vocal hyperfunction (VH) and matched controls.
Eight-four patients with VH and 74 participants with normal voices answered 3 vocal status questions-difficulty producing soft, high-pitched phonation (D-SHP); discomfort; and fatigue-on an ambulatory voice monitor at the beginning, 5-hr intervals, and the end of each day (7 total days). Two subsets of the patient group answered the questions during a 2nd week after voice therapy (29 patients) or laryngeal surgery (16 patients).
High reliability resulted for patients (Cronbach's α = .88) and controls (α = .95). Patients reported higher D-SHP, discomfort, and fatigue (Cohen's d = 1.62-1.92) compared with controls. Patients posttherapy and postsurgery reported significantly improved self-ratings of vocal status relative to their pretreatment ratings (d = 0.70-1.13). Within-subject changes in self-ratings greater than 20 points were considered clinically meaningful.
Ratings of D-SHP, discomfort, and fatigue have adequate reliability and validity for tracking vocal status throughout daily lifein patients with VH and vocally healthy individuals. These questions could help investigate the relationship between vocal symptom variability and putative contributing factors (e.g., voice use/rest, emotions).
Ambulatory monitoring of real-world voice characteristics and behavior has the potential to provide important assessment of voice and speech disorders and psychological and emotional state. In this paper, we report on the novel development of a lightweight, wireless voice monitor that synchronously records dual-channel data from an acoustic microphone and a neck-surface accelerometer embedded on a flex circuit. In this paper, Lombard speech effects were investigated in pilot data from four adult speakers with normal vocal function who read a phonetically balanced paragraph in the presence of different ambient acoustic noise levels. Whereas the signal-to-noise ratio (SNR) of the microphone signal decreased in the presence of increasing ambient noise level, the SNR of the accelerometer sensor remained high. Lombard speech properties were thus robustly computed from the accelerometer signal and observed in all four speakers who exhibited increases in average estimates of sound pressure level (+2.3 dB), fundamental frequency (+21.4 Hz), and cepstral peak prominence (+1.3 dB) from quiet to loud ambient conditions. Future work calls for ambulatory data collection in naturalistic environments, where the microphone acts as a sound level meter and the accelerometer functions as a noise-robust voicing sensor to assess voice disorders, neurological conditions, and cognitive load.