Previous work using ambulatory voice recordings has shown no differences in average vocal behavior between patients with phonotraumatic vocal hyperfunction and matched controls. This study used larger groups to replicate these results and expanded the analysis to include distributional characteristics of ambulatory voice use and measures indicative of glottal closure.
Subjects included 180 adult women: 90 diagnosed with vocal fold nodules or polyps and 90 age-, sex-, and occupation-matched controls with no history of voice disorders. Weeklong summary statistics (average, variability, skewness, kurtosis) of voice use were computed from neck-surface acceleration recorded using an ambulatory voice monitor. Voice measures included estimates of sound pressure level (SPL), fundamental frequency (fo), cepstral peak prominence, and the difference between the first and second harmonic magnitudes (H1–H2).
Statistical comparisons resulted in medium–large differences (Cohen's d ≥ 0.5) between groups for SPL skewness, fo variability, and H1–H2 variability. Two logistic regressions (theory-based and stepwise) found SPL skewness and H1–H2 variability to classify patients and controls based on their weekly voice data, with an area under the receiver operating characteristic curve of 0.85 and 0.82 on training and test sets, respectively.
Compared to controls, the weekly voice use of patients with phonotraumatic vocal hyperfunction reflected higher SPL tendencies (negatively skewed SPL) with more abrupt glottal closure (reduced H1–H2 variability, especially toward higher values). Further work could examine posttreatment data (e.g., after surgery and/or therapy) to determine the extent to which these differences are associated with the etiology and pathophysiology of phonotraumatic vocal fold lesions.
Subglottal air pressure plays a major role in voice production and is a primary factor in controlling voice onset, offset, sound pressure level, glottal airflow, vocal fold collision pressures, and variations in fundamental frequency. Previous work has shown promise for the estimation of subglottal pressure from an unobtrusive miniature accelerometer sensor attached to the anterior base of the neck during typical modal voice production across multiple pitch and vowel contexts. This study expands on that work to incorporate additional accelerometer-based measures of vocal function to compensate for non-modal phonation characteristics and achieve an improved estimation of subglottal pressure. Subjects with normal voices repeated /p/-vowel syllable strings from loud-to-soft levels in multiple vowel contexts (/a/, /i/, and /u/), pitch conditions (comfortable, lower than comfortable, higher than comfortable), and voice quality types (modal, breathy, strained, and rough). Subject-specific, stepwise regression models were constructed using root-mean-square (RMS) values of the accelerometer signal alone (baseline condition) and in combination with cepstral peak prominence, fundamental frequency, and glottal airflow measures derived using subglottal impedance-based inverse filtering. Five-fold cross-validation assessed the robustness of model performance using the root-mean-square error metric for each regression model. Each cross-validation fold exhibited up to a 25% decrease in prediction error when the model incorporated multi-dimensional aspects of the accelerometer signal compared with RMS-only models. Improved estimation of subglottal pressure for non-modal phonation was thus achievable, lending to future studies of subglottal pressure estimation in patients with voice disorders and in ambulatory voice recordings.
The purpose of this study was to evaluate the effects of nonmodal phonation on estimates of subglottal pressure (Ps) derived from the magnitude of a neck-surface accelerometer (ACC) signal and to confirm previous findings regarding the impact of vowel contexts and pitch levels in a larger cohort of participants.
Twenty-six vocally healthy participants (18 women, 8 men) were asked to produce a series of p-vowel syllables with descending loudness in 3 vowel contexts (/a/, /i/, and /u/), 3 pitch levels (comfortable, high, and low), and 4 elicited phonatory conditions (modal, breathy, strained, and rough). Estimates of Ps for each vowel segment were obtained by averaging the intraoral air pressure plateau before and after each segment. The root-mean-square magnitude of the neck-surface ACC signal was computed for each vowel segment. Three linear mixed-effects models were used to statistically assess the effects of vowel, pitch, and phonatory condition on the linear relationship (slope and intercept) between Ps and ACC signal magnitude.
Results demonstrated statistically significant linear relationships between ACC signal magnitude and Ps within participants but with increased intercepts for the nonmodal phonatory conditions; slopes were affected to a lesser extent. Vowel and pitch contexts did not significantly affect the linear relationship between ACC signal magnitude and Ps.
The classic linear relationship between ACC signal magnitude and Ps is significantly affected when nonmodal phonation is produced by a speaker. Future work is warranted to further characterize nonmodal phonatory characteristics to improve the ACC-based prediction of Ps during naturalistic speech production.
Ambulatory voice monitoring is a promising tool for investigating phonotraumatic vocal hyperfunction (PVH), associated with the development of vocal fold lesions. Since many patients with PVH are professional vocalists, a classifier was developed to better understand phonatory mechanisms during speech and singing. Twenty singers with PVH and 20 matched healthy controls were monitored with a neck-surface accelerometer–based ambulatory voice monitor. An expert-labeled ground truth data set was used to train a logistic regression on 15 subject-pairs with fundamental frequency and autocorrelation peak amplitude as input features. Overall classification accuracy of 94.2% was achieved on the held-out test set.
Miniature high-bandwidth accelerometers on the anterior neck surface are used in laboratory and ambulatory settings to obtain vocal function measures. This study compared the widely applied L1–L2 measure (historically, H1–H2)—the difference between the log-magnitude of the first and second harmonics—computed from the glottal airflow waveform with L1–L2 derived from the raw neck-surface acceleration signal in 79 vocally healthy female speakers. Results showed a significant correlation (r = 0.72) between L1–L2 values estimated from both airflow and accelerometer signals, suggesting that raw accelerometer-based estimates of L1–L2 may be interpreted as reflecting glottal physiological parameters and voice quality attributes during phonation.
Phonotraumatic vocal hyperfunction (PVH) is associated with chronic misuse and/or abuse of voice that can result in lesions such as vocalfold nodules. The clinical aerodynamic assessment of vocal function has been recently shown to differentiate between patients with PVH and healthy controls to provide meaningful insight into pathophysiological mechanisms associated with these disorders. However, all current clinical assessment of PVH is incomplete because of its inability to objectively identify the type and extent of detrimental phonatory function that is associated with PVH during daily voice use. The current study sought to address this issue by incorporating, for the first time in a comprehensive ambulatory assessment, glottal airflow parameters estimated from a neck-mounted accelerometer and recorded to a smartphone-based voice monitor. We tested this approach on 48 patients with vocal fold nodules and 48 matched healthy-control subjects who each wore the voice monitor for a week. Seven glottal airflow features were estimated every 50 ms using an impedance-based inverse filtering scheme, and seven high-order summary statistics of each feature were computed every 5 minutes over voiced segments. Based on a univariate hypothesis testing, eight glottal airflow summary statistics were found to be statistically different between patient and healthy-control groups. L1-regularized logistic regression for a supervised classification task yielded a mean (standard deviation) area under the ROC curve of 0.82 (0.25) and an accuracy of 0.83 (0.14). These results outperform the state-of-the-art classification for the same classification task and provide a new avenue to improve the assessment and treatment of hyperfunctional voice disorders.
The aim of this study was to establish reliability and validity for self-ratings of vocal status obtained during the daily activities of patients with vocal hyperfunction (VH) and matched controls.
Eight-four patients with VH and 74 participants with normal voices answered 3 vocal status questions-difficulty producing soft, high-pitched phonation (D-SHP); discomfort; and fatigue-on an ambulatory voice monitor at the beginning, 5-hr intervals, and the end of each day (7 total days). Two subsets of the patient group answered the questions during a 2nd week after voice therapy (29 patients) or laryngeal surgery (16 patients).
High reliability resulted for patients (Cronbach's α = .88) and controls (α = .95). Patients reported higher D-SHP, discomfort, and fatigue (Cohen's d = 1.62-1.92) compared with controls. Patients posttherapy and postsurgery reported significantly improved self-ratings of vocal status relative to their pretreatment ratings (d = 0.70-1.13). Within-subject changes in self-ratings greater than 20 points were considered clinically meaningful.
Ratings of D-SHP, discomfort, and fatigue have adequate reliability and validity for tracking vocal status throughout daily lifein patients with VH and vocally healthy individuals. These questions could help investigate the relationship between vocal symptom variability and putative contributing factors (e.g., voice use/rest, emotions).
Ambulatory monitoring of real-world voice characteristics and behavior has the potential to provide important assessment of voice and speech disorders and psychological and emotional state. In this paper, we report on the novel development of a lightweight, wireless voice monitor that synchronously records dual-channel data from an acoustic microphone and a neck-surface accelerometer embedded on a flex circuit. In this paper, Lombard speech effects were investigated in pilot data from four adult speakers with normal vocal function who read a phonetically balanced paragraph in the presence of different ambient acoustic noise levels. Whereas the signal-to-noise ratio (SNR) of the microphone signal decreased in the presence of increasing ambient noise level, the SNR of the accelerometer sensor remained high. Lombard speech properties were thus robustly computed from the accelerometer signal and observed in all four speakers who exhibited increases in average estimates of sound pressure level (+2.3 dB), fundamental frequency (+21.4 Hz), and cepstral peak prominence (+1.3 dB) from quiet to loud ambient conditions. Future work calls for ambulatory data collection in naturalistic environments, where the microphone acts as a sound level meter and the accelerometer functions as a noise-robust voicing sensor to assess voice disorders, neurological conditions, and cognitive load.
This article provides a summary of some recent innovations in voice assessment expected to have an impact in the next 5–10 years on how patients with voice disorders are clinically managed by speech-language pathologists. Specific innovations discussed are in the areas of laryngeal imaging, ambulatory voice monitoring, and “big data” analysis using machine learning to produce new metrics for vocal health. Also discussed is the potential for using voice analysis to detect and monitor other health conditions.
Purpose Ambulatory voice biofeedback has the potential to significantly improve voice therapy effectiveness by targeting carryover of desired behaviors outside the therapy session (i.e., retention). This study applies motor learning concepts (reduced frequency and delayed, summary feedback) that demonstrate increased retention to ambulatory voice monitoring for training nurses to talk softer during work hours.
Method Forty-eight nurses with normal voices wore the Voice Health Monitor (Mehta, Zañartu, Feng, Cheyne, & Hillman, 2012) for 6 days: 3 baseline days, 1 biofeedback day, 1 short-term retention day, and 1 long-term retention day. Participants were block-randomized into 3 different biofeedback groups: 100%, 25%, and Summary. Performance was measured in terms of compliance time below a participant-specific vocal intensity threshold.
Results All participants exhibited a significant increase in compliance time (Cohen's d = 4.5) during biofeedback days compared with baseline days. The Summary feedback group exhibited statistically smaller performance reduction during both short-term (d = 1.14) and long-term (d = 1.04) retention days compared with the 100% feedback group.
Conclusions These findings suggest that modifications in feedback frequency and timing affect retention of a modified vocal behavior in daily life. Future work calls for studying the potential beneficial impact of ambulatory voice biofeedback in participants with behaviorally based voice disorders.
Purpose Ambulatory voice biofeedback (AVB) has the potential to significantly improve voice therapy effectiveness by targeting one of the most challenging aspects of rehabilitation: carryover of desired behaviors outside of the therapy session. Although initial evidence indicates that AVB can alter vocal behavior in daily life, retention of the new behavior after biofeedback has not been demonstrated. Motor learning studies repeatedly have shown retention-related benefits when reducing feedback frequency or providing summary statistics. Therefore, novel AVB settings that are based on these concepts are developed and implemented.
Method The underlying theoretical framework and resultant implementation of innovative AVB settings on a smartphone-based voice monitor are described. A clinical case study demonstrates the functionality of the new relative frequency feedback capabilities.
Results With new technical capabilities, 2 aspects of feedback are directly modifiable for AVB: relative frequency and summary feedback. Although reduced-frequency AVB was associated with improved carryover of a therapeutic vocal behavior (i.e., reduced vocal intensity) in a patient post-excision of vocal fold nodules, causation cannot be assumed.
Conclusions Timing and frequency of AVB schedules can be manipulated to empirically assess generalization of motor learning principles to vocal behavior modification and test the clinical effectiveness of AVB with various feedback schedules.
Purpose The purpose of this study was to evaluate the potential for estimating subglottal air pressure using a neck-surface accelerometer and to compare the accuracy of predicting subglottal air pressure relative to predicting acoustic sound pressure level (SPL).
Method Indirect estimates of subglottal pressure (Psg′) were obtained from 10 vocally healthy speakers during loud-to-soft repetitions of 3 different /p/–vowel gestures (/pa/, /pi/, /pu/) at 3 pitch levels in the modal register. Intraoral air pressure, neck-surface acceleration, and radiated acoustic pressure were recorded, and the root-mean-square amplitude of the acceleration signal was correlated with Psg′ and SPL.
Results The coefficient of determination between accelerometer level and Psg′ was high when data were pooled from all vowel and pitch contexts for each participant (r2 = .68–.93). These relationships were stronger than corresponding relationships between accelerometer level and SPL (r2 = .46–.81). The average 95% prediction interval for estimating Psg′ using accelerometer level was ±2.53 cm H2O, ranging from ±1.70 to ±3.74 cm H2O across participants.
Conclusions Accelerometer signal amplitude correlated more strongly with Psg′ than with SPL. Future work is warranted to investigate the robustness of the relationship in nonmodal voice qualities, individuals with voice disorders, and accelerometer-based ambulatory monitoring of subglottal pressure.