This study introduces the in vivo application of a Bayesian framework to estimate subglottal pressure, laryngeal muscle activation, and vocal fold contact pressure from calibrated transnasal high-speed videoendoscopy and oral airflow data. A subject-specific, lumped-element vocal fold model is estimated using an extended Kalman filter and two observation models involving glottal area and glottal airflow. Model-based inferences using data from a vocally healthy male individual are compared with empirical estimates of subglottal pressure and reference values for muscle activation and contact pressure in the literature, thus providing baseline error metrics for future clinical investigations.
Subglottal air pressure plays a major role in voice production and is a primary factor in controlling voice onset, offset, sound pressure level, glottal airflow, vocal fold collision pressures, and variations in fundamental frequency. Previous work has shown promise for the estimation of subglottal pressure from an unobtrusive miniature accelerometer sensor attached to the anterior base of the neck during typical modal voice production across multiple pitch and vowel contexts. This study expands on that work to incorporate additional accelerometer-based measures of vocal function to compensate for non-modal phonation characteristics and achieve an improved estimation of subglottal pressure. Subjects with normal voices repeated /p/-vowel syllable strings from loud-to-soft levels in multiple vowel contexts (/a/, /i/, and /u/), pitch conditions (comfortable, lower than comfortable, higher than comfortable), and voice quality types (modal, breathy, strained, and rough). Subject-specific, stepwise regression models were constructed using root-mean-square (RMS) values of the accelerometer signal alone (baseline condition) and in combination with cepstral peak prominence, fundamental frequency, and glottal airflow measures derived using subglottal impedance-based inverse filtering. Five-fold cross-validation assessed the robustness of model performance using the root-mean-square error metric for each regression model. Each cross-validation fold exhibited up to a 25% decrease in prediction error when the model incorporated multi-dimensional aspects of the accelerometer signal compared with RMS-only models. Improved estimation of subglottal pressure for non-modal phonation was thus achievable, lending to future studies of subglottal pressure estimation in patients with voice disorders and in ambulatory voice recordings.
Excessive vocal fold collision pressures during phonation are considered to play a primary role in the formation of benign vocal fold lesions, such as nodules. The ability to accurately and reliably acquire intraglottal pressure has the potential to provide unique insights into the pathophysiology of phonotrauma. Difficulties arise, however, in directly measuring vocal fold contact pressures due to physical intrusion from the sensor that may disrupt the contact mechanics, as well as difficulty in determining probe/sensor position relative to the contact location. These issues are quantified and addressed through the implementation of a novel approach for identifying the timing and location of vocal fold contact, and measuring intraglottal and vocal fold contact pressures via a pressure probe embedded in the wall of a hemi-laryngeal flow facility. The accuracy and sensitivity of the pressure measurements are validated against ground truth values. Application to in vivo approaches are assessed by acquiring intraglottal and VF contact pressures using a synthetic, self-oscillating vocal fold model in a hemi-laryngeal configuration, where the sensitivity of the measured intraglottal and vocal fold contact pressure relative to the sensor position is explored.
The purpose of this study was to evaluate the effects of nonmodal phonation on estimates of subglottal pressure (Ps) derived from the magnitude of a neck-surface accelerometer (ACC) signal and to confirm previous findings regarding the impact of vowel contexts and pitch levels in a larger cohort of participants.
Twenty-six vocally healthy participants (18 women, 8 men) were asked to produce a series of p-vowel syllables with descending loudness in 3 vowel contexts (/a/, /i/, and /u/), 3 pitch levels (comfortable, high, and low), and 4 elicited phonatory conditions (modal, breathy, strained, and rough). Estimates of Ps for each vowel segment were obtained by averaging the intraoral air pressure plateau before and after each segment. The root-mean-square magnitude of the neck-surface ACC signal was computed for each vowel segment. Three linear mixed-effects models were used to statistically assess the effects of vowel, pitch, and phonatory condition on the linear relationship (slope and intercept) between Ps and ACC signal magnitude.
Results demonstrated statistically significant linear relationships between ACC signal magnitude and Ps within participants but with increased intercepts for the nonmodal phonatory conditions; slopes were affected to a lesser extent. Vowel and pitch contexts did not significantly affect the linear relationship between ACC signal magnitude and Ps.
The classic linear relationship between ACC signal magnitude and Ps is significantly affected when nonmodal phonation is produced by a speaker. Future work is warranted to further characterize nonmodal phonatory characteristics to improve the ACC-based prediction of Ps during naturalistic speech production.
Purpose: Prior investigations suggest that simultaneous performance of more than 1 motor-oriented task may exacerbate speech motor deficits in individuals with Parkinson disease (PD). The purpose of the current investigation was to examine the extent to which performing a low-demand manual task affected the connected speech in individuals with and without PD. Method: Individuals with PD and neurologically healthy controls performed speech tasks (reading and extemporaneous speech tasks) and an oscillatory manual task (a counterclockwise circle-drawing task) in isolation (single-task condition) and concurrently (dual-task condition). Results: Relative to speech task performance, no changes in speech acoustics were observed for either group when the low-demand motor task was performed with the concurrent reading tasks. Speakers with PD exhibited a significant decrease in pause duration between the single-task (speech only) and dual-task conditions for the extemporaneous speech task, whereas control participants did not exhibit changes in any speech production variable between the single- and dual-task conditions. Conclusions: Overall, there were little to no changes in speech production when a low-demand oscillatory motor task was performed with concurrent reading. For the extemporaneous task, however, individuals with PD exhibited significant changes when the speech and manual tasks were performed concurrently, a pattern that was not observed for control speakers. Supplemental Material: https://doi.org/10.23641/asha. 8637008
Purpose: The purpose of the current study was to characterize clear speech production for speakers with and without Parkinson disease (PD) using several measures of working vowel space computed from frequently sampled formant trajectories. Method: The 1st 2 formant frequencies were tracked for a reading passage that was produced using habitual and clear speaking styles by 15 speakers with PD and 15 healthy control speakers. Vowel space metrics were calculated from the distribution of frequently sampled formant frequency tracks, including vowel space hull area, articulatory–acoustic vowel space, and multiple vowel space density (VSD) measures based on different percentile contours of the formant density distribution. Results: Both speaker groups exhibited significant increases in the articulatory–acoustic vowel space and VSD10, the area of the outermost (10th percentile) contour of the formant density distribution, from habitual to clear styles. These clarity-related vowel space increases were significantly smaller for speakers with PD than controls. Both groups also exhibited a significant increase in vowel space hull area; however, this metric was not sensitive to differences in the clear speech response between groups. Relative to healthy controls, speakers with PD exhibited a significantly smaller VSD90, the area of the most central (90th percentile), densely populated region of the formant space. Conclusions: Using vowel space metrics calculated from formant traces of the reading passage, the current work suggests that speakers with PD do indeed reach the more peripheral regions of the vowel space during connected speech but spend a larger percentage of the time in more central regions of formant space than healthy speakers. Additionally, working vowel space metrics based on the distribution of formant data suggested that speakers with PD exhibited less of a clarity-related increase in formant space than controls, a trend that was not observed for perimeter-based measures of vowel space area.
Phonotraumatic vocal hyperfunction (PVH) is associated with chronic misuse and/or abuse of voice that can result in lesions such as vocalfold nodules. The clinical aerodynamic assessment of vocal function has been recently shown to differentiate between patients with PVH and healthy controls to provide meaningful insight into pathophysiological mechanisms associated with these disorders. However, all current clinical assessment of PVH is incomplete because of its inability to objectively identify the type and extent of detrimental phonatory function that is associated with PVH during daily voice use. The current study sought to address this issue by incorporating, for the first time in a comprehensive ambulatory assessment, glottal airflow parameters estimated from a neck-mounted accelerometer and recorded to a smartphone-based voice monitor. We tested this approach on 48 patients with vocal fold nodules and 48 matched healthy-control subjects who each wore the voice monitor for a week. Seven glottal airflow features were estimated every 50 ms using an impedance-based inverse filtering scheme, and seven high-order summary statistics of each feature were computed every 5 minutes over voiced segments. Based on a univariate hypothesis testing, eight glottal airflow summary statistics were found to be statistically different between patient and healthy-control groups. L1-regularized logistic regression for a supervised classification task yielded a mean (standard deviation) area under the ROC curve of 0.82 (0.25) and an accuracy of 0.83 (0.14). These results outperform the state-of-the-art classification for the same classification task and provide a new avenue to improve the assessment and treatment of hyperfunctional voice disorders.
The purpose of this study was to determine the validity of preliminary reports showing that glottal aerodynamic measures can identify pathophysiological phonatory mechanisms for phonotraumatic and nonphonotraumatic vocal hyperfunction, which are each distinctly different from normal vocal function.
Glottal aerodynamic measures (estimates of subglottal air pressure, peak-to-peak airflow, maximum flow declination rate, and open quotient) were obtained noninvasively using a pneumotachograph mask with an intraoral pressure catheter in 16 women with organic vocal fold lesions, 16 women with muscle tension dysphonia, and 2 associated matched control groups with normal voices. Subjects produced /pae/ syllable strings from which glottal airflow was estimated using inverse filtering during /ae/ vowels, and subglottal pressure was estimated during /p/ closures. All measures were normalized for sound pressure level (SPL) and statistically tested for differences between patient and control groups.
All SPL-normalized measures were significantly lower in the phonotraumatic group as compared with measures in its control group. For the nonphonotraumatic group, only SPL-normalized subglottal pressure and open quotient were significantly lower than measures in its control group.
Results of this study confirm previous hypotheses and preliminary results indicating that SPL-normalized estimates of glottal aerodynamic measures can be used to describe the different pathophysiological phonatory mechanisms associated with phonotraumatic and nonphonotraumatic vocal hyperfunction.
To determine the validity of preliminary reports showing that glottal aerodynamic measures can identify pathophysiological phonatory mechanisms for phonotraumatic and non-phonotraumatic vocal hyperfunction that are each distinctly different from normal vocal function.
Glottal aerodynamic measures (estimates of subglottal air pressure, peak-to-peak airflow, maximum flow declination rate, and open quotient) were obtained non-invasively using a pneumotachograph mask with intra-oral pressure catheter in 16 adult females with organic vocal fold lesions, 16 adult females with muscle tension dysphonia, and two associated matched control groups with normal voices. Subjects produced /pae/ syllable strings from which glottal airflow was estimated using inverse filtering during /ae/ vowels, and subglottal pressure was estimated during /p/ closures. All measures were normalized for sound pressure level (SPL) and statistically tested for differences between patient and control groups.
All SPL-normalized measures were significantly lower in the phonotraumatic group as compared to measures in its control group. For the non-phonotraumatic group, only SPL-normalized subglottal pressure and open quotient were significantly lower than measures in its control group.
Results of this study confirm previous hypotheses and preliminary results indicating that SPL-normalized estimates of glottal aerodynamic measures can be used to describe the different pathophysiological phonatory mechanisms associated with phonotraumatic and non-phonotraumatic vocal hyperfunction.
Glottal inverse filtering aims to estimate the glottal airflow signal from a speech signal for applications such as speaker recognition and clinical voice assessment. Nonetheless, evaluation of inverse filtering algorithms has been challenging due to the practical difficulties of directly measuring glottal airflow. Apart from this, it is acknowledged that the performance of many methods degrade in voice conditions that are of great interest, such as breathiness, high pitch, soft voice, and running speech. This paper presents a comprehensive, objective, and comparative evaluation of state-of-the-art inverse filtering algorithms that takes advantage of speech and glottal airflow signals generated by a physiological speech synthesizer. The synthesizer provides a physics-based simulation of the voice production process and thus an adequate test bed for revealing the temporal and spectral performance characteristics of each algorithm. Included in the synthetic data are continuous speech utterances and sustained vowels, which are produced with multiple voice qualities (pressed, slightly pressed, modal, slightly breathy, and breathy), fundamental frequencies, and subglottal pressures to simulate the natural variations in real speech. In evaluating the accuracy of a glottal flow estimate, multiple error measures are used, including an error in the estimated signal that measures overall waveform deviation, as well as an error in each of several clinically relevant features extracted from the glottal flow estimate. Waveform errors calculated from glottal flow estimation experiments exhibited mean values around 30% for sustained vowels, and around 40% for continuous speech, of the amplitude of true glottal flow derivative. Closed-phase approaches showed remarkable stability across different voice qualities and subglottal pressures. The algorithms of choice, as suggested by significance tests, are closed-phase covariance analysis for the analysis of sustained vowels, and sparse linear prediction for the analysis of continuous speech. Results of data subset analysis suggest that analysis of close rounded vowels is an additional challenge in glottal flow estimation.
Purpose The purpose of this study was to evaluate the potential for estimating subglottal air pressure using a neck-surface accelerometer and to compare the accuracy of predicting subglottal air pressure relative to predicting acoustic sound pressure level (SPL).
Method Indirect estimates of subglottal pressure (Psg′) were obtained from 10 vocally healthy speakers during loud-to-soft repetitions of 3 different /p/–vowel gestures (/pa/, /pi/, /pu/) at 3 pitch levels in the modal register. Intraoral air pressure, neck-surface acceleration, and radiated acoustic pressure were recorded, and the root-mean-square amplitude of the acceleration signal was correlated with Psg′ and SPL.
Results The coefficient of determination between accelerometer level and Psg′ was high when data were pooled from all vowel and pitch contexts for each participant (r2 = .68–.93). These relationships were stronger than corresponding relationships between accelerometer level and SPL (r2 = .46–.81). The average 95% prediction interval for estimating Psg′ using accelerometer level was ±2.53 cm H2O, ranging from ±1.70 to ±3.74 cm H2O across participants.
Conclusions Accelerometer signal amplitude correlated more strongly with Psg′ than with SPL. Future work is warranted to investigate the robustness of the relationship in nonmodal voice qualities, individuals with voice disorders, and accelerometer-based ambulatory monitoring of subglottal pressure.
Monitoring subglottal neck-surface acceleration has received renewed attention due to the ability of low-profile accelerometers to confidentially and noninvasively track properties related to normal and disordered voice characteristics and behavior. This study investigated the ability of subglottal necksurface acceleration to yield vocal function measures traditionally derived from the acoustic voice signal and help guide the development of clinically functional accelerometer-based measures from a physiological perspective. Results are reported for 82 adult speakers with voice disorders and 52 adult speakers with normal voices who produced the sustained vowels /A/, /i/, and /u/ at a comfortable pitch and loudness during the simultaneous recording of radiated acoustic pressure and subglottal necksurface acceleration. As expected, timing-related measures of jitter exhibited the strongest correlation between acoustic and necksurface acceleration waveforms (r 0:99), whereas amplitudebased measures of shimmer correlated less strongly (r 0:74). Additionally, weaker correlations were exhibited by spectral measures of harmonics-to-noise ratio (r 0:69) and tilt (r 0:57), whereas the cepstral peak prominence correlated more strongly (r 0:90). These empirical relationships provide evidence to support the use of accelerometers as effective complements to acoustic recordings in the assessment and monitoring of vocal function in the laboratory, clinic, and during an individual’s daily activities.