Source/Filter Estimation

2019
M. Motie-Shirazi, et al., “Toward development of a vocal fold contact pressure probe: Sensor characterization and validation using synthetic vocal fold models,” Applied Sciences, vol. 9, no. 15, pp. 3002, 2019. Publisher's VersionAbstract
Excessive vocal fold collision pressures during phonation are considered to play a primary role in the formation of benign vocal fold lesions, such as nodules. The ability to accurately and reliably acquire intraglottal pressure has the potential to provide unique insights into the pathophysiology of phonotrauma. Difficulties arise, however, in directly measuring vocal fold contact pressures due to physical intrusion from the sensor that may disrupt the contact mechanics, as well as difficulty in determining probe/sensor position relative to the contact location. These issues are quantified and addressed through the implementation of a novel approach for identifying the timing and location of vocal fold contact, and measuring intraglottal and vocal fold contact pressures via a pressure probe embedded in the wall of a hemi-laryngeal flow facility. The accuracy and sensitivity of the pressure measurements are validated against ground truth values. Application to in vivo approaches are assessed by acquiring intraglottal and VF contact pressures using a synthetic, self-oscillating vocal fold model in a hemi-laryngeal configuration, where the sensitivity of the measured intraglottal and vocal fold contact pressure relative to the sensor position is explored.
Paper
K. L. Marks, J. Z. Lin, A. Fox, L. E. Toles, and D. D. Mehta, “Impact of non-modal phonation on estimates of subglottal pressure from neck-surface acceleration in healthy speakers,” Journal of Speech, Language, and Hearing Research, vol. 62, no. 9, pp. 3339-3358, 2019. Publisher's VersionAbstract

Purpose

The purpose of this study was to evaluate the effects of nonmodal phonation on estimates of subglottal pressure (Ps) derived from the magnitude of a neck-surface accelerometer (ACC) signal and to confirm previous findings regarding the impact of vowel contexts and pitch levels in a larger cohort of participants.

Method

Twenty-six vocally healthy participants (18 women, 8 men) were asked to produce a series of p-vowel syllables with descending loudness in 3 vowel contexts (/a/, /i/, and /u/), 3 pitch levels (comfortable, high, and low), and 4 elicited phonatory conditions (modal, breathy, strained, and rough). Estimates of Ps for each vowel segment were obtained by averaging the intraoral air pressure plateau before and after each segment. The root-mean-square magnitude of the neck-surface ACC signal was computed for each vowel segment. Three linear mixed-effects models were used to statistically assess the effects of vowel, pitch, and phonatory condition on the linear relationship (slope and intercept) between Ps and ACC signal magnitude.

Results

Results demonstrated statistically significant linear relationships between ACC signal magnitude and Ps within participants but with increased intercepts for the nonmodal phonatory conditions; slopes were affected to a lesser extent. Vowel and pitch contexts did not significantly affect the linear relationship between ACC signal magnitude and Ps.

Conclusion

The classic linear relationship between ACC signal magnitude and Ps is significantly affected when nonmodal phonation is produced by a speaker. Future work is warranted to further characterize nonmodal phonatory characteristics to improve the ACC-based prediction of Ps during naturalistic speech production.

Paper
J. A. Whitfield, Z. Kriegel, A. M. Fullenkamp, and D. D. Mehta, “Effects of concurrent manual task performance on connected speech acoustics in individuals with Parkinson disease,” Journal of Speech, Language, and Hearing Research, vol. 62, no. 7, pp. 2099–2117, 2019. Publisher's VersionAbstract
Purpose: Prior investigations suggest that simultaneous
performance of more than 1 motor-oriented task may
exacerbate speech motor deficits in individuals with
Parkinson disease (PD). The purpose of the current
investigation was to examine the extent to which
performing a low-demand manual task affected the
connected speech in individuals with and without PD.
Method: Individuals with PD and neurologically healthy
controls performed speech tasks (reading and
extemporaneous speech tasks) and an oscillatory
manual task (a counterclockwise circle-drawing
task) in isolation (single-task condition) and concurrently
(dual-task condition).
Results: Relative to speech task performance, no changes
in speech acoustics were observed for either group when
the low-demand motor task was performed with the
concurrent reading tasks. Speakers with PD exhibited
a significant decrease in pause duration between the
single-task (speech only) and dual-task conditions
for the extemporaneous speech task, whereas control
participants did not exhibit changes in any speech
production variable between the single- and dual-task
conditions.
Conclusions: Overall, there were little to no changes in
speech production when a low-demand oscillatory motor
task was performed with concurrent reading. For the
extemporaneous task, however, individuals with PD
exhibited significant changes when the speech and manual
tasks were performed concurrently, a pattern that was
not observed for control speakers.
Supplemental Material: https://doi.org/10.23641/asha.
8637008
Paper
J. A. Whitfield and D. D. Mehta, “Examination of clear speech in Parkinson disease using passage-level vowel space metrics,” Journal of Speech, Language, and Hearing Research, vol. 62, no. 7, pp. 2082–2098, 2019. Publisher's VersionAbstract
Purpose: The purpose of the current study was to characterize
clear speech production for speakers with and without
Parkinson disease (PD) using several measures of working
vowel space computed from frequently sampled formant
trajectories.
Method: The 1st 2 formant frequencies were tracked for
a reading passage that was produced using habitual and
clear speaking styles by 15 speakers with PD and 15 healthy
control speakers. Vowel space metrics were calculated
from the distribution of frequently sampled formant frequency
tracks, including vowel space hull area, articulatory–acoustic
vowel space, and multiple vowel space density (VSD)
measures based on different percentile contours of the
formant density distribution.
Results: Both speaker groups exhibited significant
increases in the articulatory–acoustic vowel space and
VSD10, the area of the outermost (10th percentile)
contour of the formant density distribution, from habitual
to clear styles. These clarity-related vowel space increases
were significantly smaller for speakers with PD than
controls. Both groups also exhibited a significant increase
in vowel space hull area; however, this metric was not
sensitive to differences in the clear speech response
between groups. Relative to healthy controls, speakers
with PD exhibited a significantly smaller VSD90, the area
of the most central (90th percentile), densely populated
region of the formant space.
Conclusions: Using vowel space metrics calculated from
formant traces of the reading passage, the current work
suggests that speakers with PD do indeed reach the more
peripheral regions of the vowel space during connected
speech but spend a larger percentage of the time in more
central regions of formant space than healthy speakers.
Additionally, working vowel space metrics based on the
distribution of formant data suggested that speakers with
PD exhibited less of a clarity-related increase in formant
space than controls, a trend that was not observed for
perimeter-based measures of vowel space area.
Paper
2018
J. P. Cortés, et al., “Ambulatory assessment of phonotraumatic vocal hyperfunction using glottal airflow measures estimated from neck-surface acceleration,” PLoS One, vol. 13, no. 12, pp. e0209017, 2018. Publisher's VersionAbstract
Phonotraumatic vocal hyperfunction (PVH) is associated with chronic misuse and/or abuse of voice that can result in lesions such as vocalfold nodules. The clinical aerodynamic assessment of vocal function has been recently shown to differentiate between patients with PVH and healthy controls to provide meaningful insight into pathophysiological mechanisms associated with these disorders. However, all current clinical assessment of PVH is incomplete because of its inability to objectively identify the type and extent of detrimental phonatory function that is associated with PVH during daily voice use. The current study sought to address this issue by incorporating, for the first time in a comprehensive ambulatory assessment, glottal airflow parameters estimated from a neck-mounted accelerometer and recorded to a smartphone-based voice monitor. We tested this approach on 48 patients with vocal fold nodules and 48 matched healthy-control subjects who each wore the voice monitor for a week. Seven glottal airflow features were estimated every 50 ms using an impedance-based inverse filtering scheme, and seven high-order summary statistics of each feature were computed every 5 minutes over voiced segments. Based on a univariate hypothesis testing, eight glottal airflow summary statistics were found to be statistically different between patient and healthy-control groups. L1-regularized logistic regression for a supervised classification task yielded a mean (standard deviation) area under the ROC curve of 0.82 (0.25) and an accuracy of 0.83 (0.14). These results outperform the state-of-the-art classification for the same classification task and provide a new avenue to improve the assessment and treatment of hyperfunctional voice disorders.
Paper
2017
T. F. Quatieri, et al., “Multimodal biomarkers to discriminate cognitive state,” in The Role of Technology in Clinical Neuropsychology, R. L. Kane and T. D. Parson, Ed. Oxford University Press, 2017, pp. 409–443.
V. M. Espinoza, D. D. Mehta, J. H. Van Stan, R. E. Hillman, and M. Zañartu, “Uncertainty of glottal airflow estimation during continuous speech using impedance-based inverse filtering of the neck-surface acceleration signal,” Proceedings of the Acoustical Society of America. 2017.
V. M. Espinoza, M. Zañartu, J. H. Van Stan, D. D. Mehta, and R. E. Hillman, “Glottal aerodynamic measures in adult females with phonotraumatic and non-phonotraumatic vocal hyperfunction,” Journal of Speech, Language, and Hearing Research, vol. 60, no. 8, pp. 2159-2169, 2017. Publisher's VersionAbstract

 

PURPOSE:

The purpose of this study was to determine the validity of preliminary reports showing that glottal aerodynamic measures can identify pathophysiological phonatory mechanisms for phonotraumatic and nonphonotraumatic vocal hyperfunction, which are each distinctly different from normal vocal function.

METHOD:

Glottal aerodynamic measures (estimates of subglottal air pressure, peak-to-peak airflow, maximum flow declination rate, and open quotient) were obtained noninvasively using a pneumotachograph mask with an intraoral pressure catheter in 16 women with organic vocal fold lesions, 16 women with muscle tension dysphonia, and 2 associated matched control groups with normal voices. Subjects produced /pae/ syllable strings from which glottal airflow was estimated using inverse filtering during /ae/ vowels, and subglottal pressure was estimated during /p/ closures. All measures were normalized for sound pressure level (SPL) and statistically tested for differences between patient and control groups.

RESULTS:

All SPL-normalized measures were significantly lower in the phonotraumatic group as compared with measures in its control group. For the nonphonotraumatic group, only SPL-normalized subglottal pressure and open quotient were significantly lower than measures in its control group.

CONCLUSIONS:

Results of this study confirm previous hypotheses and preliminary results indicating that SPL-normalized estimates of glottal aerodynamic measures can be used to describe the different pathophysiological phonatory mechanisms associated with phonotraumatic and nonphonotraumatic vocal hyperfunction.

 

Purpose

To determine the validity of preliminary reports showing that glottal aerodynamic measures can identify pathophysiological phonatory mechanisms for phonotraumatic and non-phonotraumatic vocal hyperfunction that are each distinctly different from normal vocal function.

Method

Glottal aerodynamic measures (estimates of subglottal air pressure, peak-to-peak airflow, maximum flow declination rate, and open quotient) were obtained non-invasively using a pneumotachograph mask with intra-oral pressure catheter in 16 adult females with organic vocal fold lesions, 16 adult females with muscle tension dysphonia, and two associated matched control groups with normal voices. Subjects produced /pae/ syllable strings from which glottal airflow was estimated using inverse filtering during /ae/ vowels, and subglottal pressure was estimated during /p/ closures. All measures were normalized for sound pressure level (SPL) and statistically tested for differences between patient and control groups.

Results

All SPL-normalized measures were significantly lower in the phonotraumatic group as compared to measures in its control group. For the non-phonotraumatic group, only SPL-normalized subglottal pressure and open quotient were significantly lower than measures in its control group.

Conclusions

Results of this study confirm previous hypotheses and preliminary results indicating that SPL-normalized estimates of glottal aerodynamic measures can be used to describe the different pathophysiological phonatory mechanisms associated with phonotraumatic and non-phonotraumatic vocal hyperfunction.
Paper
Y. - R. Chien, D. D. Mehta, Jón Guðnason, M. Zañartu, and T. F. Quatieri, “Evaluation of glottal inverse filtering algorithms using a physiologically based articulatory speech synthesizer,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 8, pp. 1718-1730, 2017. Publisher's VersionAbstract
Glottal inverse filtering aims to estimate the glottal airflow signal from a speech signal for applications such as speaker recognition and clinical voice assessment. Nonetheless, evaluation of inverse filtering algorithms has been  challenging due to the practical difficulties of directly measuring glottal airflow. Apart from this, it is acknowledged that the performance of many methods degrade in voice conditions that are of great interest, such as breathiness, high pitch, soft voice, and running speech. This paper presents a comprehensive, objective, and comparative evaluation of state-of-the-art inverse filtering algorithms that takes advantage of speech and glottal airflow signals generated by a physiological speech synthesizer. The synthesizer provides a physics-based simulation of the voice production process and thus an adequate test bed for revealing the temporal and spectral performance characteristics of each algorithm. Included in the synthetic data are continuous speech utterances and sustained vowels, which are produced with multiple voice qualities (pressed, slightly pressed, modal, slightly breathy, and breathy), fundamental frequencies, and subglottal pressures to simulate the natural variations in real speech. In evaluating the accuracy of a glottal flow estimate, multiple error measures are used, including an error in the estimated signal that measures overall waveform deviation, as well as an error in each of several clinically relevant features extracted from the glottal flow estimate. Waveform errors calculated from glottal flow estimation experiments exhibited mean values around 30% for sustained vowels, and around 40% for continuous speech, of the amplitude of true glottal flow derivative. Closed-phase approaches showed remarkable stability across different voice qualities and subglottal pressures. The algorithms of choice, as suggested by significance tests, are closed-phase covariance analysis for the analysis of sustained vowels, and sparse linear prediction for the analysis of continuous speech. Results of data subset analysis suggest that analysis of close rounded vowels is an additional challenge in glottal flow estimation.
paper
2016
C. E. Stepp, M. Zañartu, D. D. Mehta, and R. E. Hillman, “Hyperfunctional voice disorders: Current results, clinical implications, and future directions of a multidisciplinary research program,” Proceedings of the Annual Convention of the American Speech-Language-Hearing Association, 2016.
A. S. Fryd, J. H. Van Stan, R. E. Hillman, and D. D. Mehta, “Estimating subglottal pressure from neck-surface acceleration during normal voice production,” Journal of Speech, Language, and Hearing Research, vol. 59, no. 6, pp. 1335-1345, 2016. Publisher's VersionAbstract

Purpose The purpose of this study was to evaluate the potential for estimating subglottal air pressure using a neck-surface accelerometer and to compare the accuracy of predicting subglottal air pressure relative to predicting acoustic sound pressure level (SPL).

Method Indirect estimates of subglottal pressure (Psg′) were obtained from 10 vocally healthy speakers during loud-to-soft repetitions of 3 different /p/–vowel gestures (/pa/, /pi/, /pu/) at 3 pitch levels in the modal register. Intraoral air pressure, neck-surface acceleration, and radiated acoustic pressure were recorded, and the root-mean-square amplitude of the acceleration signal was correlated with Psg′ and SPL.

Results The coefficient of determination between accelerometer level and Psg′ was high when data were pooled from all vowel and pitch contexts for each participant (r 2 = .68–.93). These relationships were stronger than corresponding relationships between accelerometer level and SPL (r 2 = .46–.81). The average 95% prediction interval for estimating Psg′ using accelerometer level was ±2.53 cm H2O, ranging from ±1.70 to ±3.74 cm H2O across participants.

Conclusions Accelerometer signal amplitude correlated more strongly with Psg′ than with SPL. Future work is warranted to investigate the robustness of the relationship in nonmodal voice qualities, individuals with voice disorders, and accelerometer-based ambulatory monitoring of subglottal pressure.

Paper
R. E. Hillman, D. Mehta, C. Stepp, J. Van Stan, and M. Zanartu, “Objective assessment of vocal hyperfunction,” Proceedings of The Journal of the Acoustical Society of America, vol. 139, pp. 2193-2194, 2016.
R. L. Horwitz-Martin, et al., “Relation of automatically extracted formant trajectories with intelligibility loss and speaking rate decline in amyotrophic lateral sclerosis,” Proceedings of InterSpeech, pp. 1205-1209, 2016. Paper
D. Mehta, J. Van Stan, and R. Hillman, “Relationships between vocal function measures derived from an acoustic microphone and a subglottal neck-surface accelerometer,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 4, pp. 659-668, 2016. Publisher's VersionAbstract

Monitoring subglottal neck-surface acceleration has received renewed attention due to the ability of low-profile accelerometers to confidentially and noninvasively track properties related to normal and disordered voice characteristics and behavior. This study investigated the ability of subglottal necksurface acceleration to yield vocal function measures traditionally derived from the acoustic voice signal and help guide the development of clinically functional accelerometer-based measures from a physiological perspective. Results are reported for 82 adult speakers with voice disorders and 52 adult speakers with normal voices who produced the sustained vowels /A/, /i/, and /u/ at a comfortable pitch and loudness during the simultaneous recording of radiated acoustic pressure and subglottal necksurface acceleration. As expected, timing-related measures of jitter exhibited the strongest correlation between acoustic and necksurface acceleration waveforms (r 0:99), whereas amplitudebased measures of shimmer correlated less strongly (r 0:74). Additionally, weaker correlations were exhibited by spectral measures of harmonics-to-noise ratio (r 0:69) and tilt (r 0:57), whereas the cepstral peak prominence correlated more strongly (r 0:90). These empirical relationships provide evidence to support the use of accelerometers as effective complements to acoustic recordings in the assessment and monitoring of vocal function in the laboratory, clinic, and during an individual’s daily activities.

Paper
2015
A. S. Fryd, J. H. Van Stan, R. E. Hillman, and D. D. Mehta, “Estimating subglottal pressure during phonation with a neck-surface accelerometer sensor,” Proceedings of the Annual Convention of the American Speech-Language-Hearing Association, 2015. Poster
Jón Guðnason, D. D. Mehta, and T. F. Quatieri, “Evaluation of speech inverse filtering techniques using a physiologically-based synthesizer,” Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, 2015. Paper
A. F. Llico, et al., “Real-time estimation of aerodynamic features for ambulatory voice biofeedback,” The Journal of the Acoustical Society of America, vol. 138, no. 1, pp. EL14-EL19, 2015. Publisher's Version Paper
J. R. Williamson, T. F. Quatieri, B. S. Helfer, G. Ciccarelli, and D. D. Mehta, “Segment-dependent dynamics in predicting Parkinson’s disease,” Proceedings of InterSpeech, pp. 518-522, 2015. Paper
D. D. Mehta and P. J. Wolfe, “Statistical properties of linear prediction analysis underlying the challenge of formant bandwidth estimation,” The Journal of the Acoustical Society of America, vol. 137, no. 2, pp. 944-950, 2015. Publisher's Version Paper
D. D. Mehta, et al., “Using ambulatory voice monitoring to investigate common voice disorders: Research update,” Frontiers in Bioengineering and Biotechnology, vol. 3, no. 155, pp. 1-14, 2015. Publisher's VersionAbstract

Many common voice disorders are chronic or recurring conditions that are likely to result from inefficient and/or abusive patterns of vocal behavior, referred to as vocal hyperfunction. The clinical management of hyperfunctional voice disorders would be greatly enhanced by the ability to monitor and quantify detrimental vocal behaviors during an individual’s activities of daily life. This paper provides an update on ongoing work that uses a miniature accelerometer on the neck surface below the larynx to collect a large set of ambulatory data on patients with hyperfunctional voice disorders (before and after treatment) and matched-control subjects. Three types of analysis approaches are being employed in an effort to identify the best set of measures for differentiating among hyperfunctional and normal patterns of vocal behavior: (1) ambulatory measures of voice use that include vocal dose and voice quality correlates, (2) aerodynamic measures based on glottal airflow estimates extracted from the accelerometer signal using subject-specific vocal system models, and (3) classification based on machine learning and pattern recognition approaches that have been used successfully in analyzing long-term recordings of other physiological signals. Preliminary results demonstrate the potential for ambulatory voice monitoring to improve the diagnosis and treatment of common hyperfunctional voice disorders.

Paper

Pages