Source/Filter Estimation

J. R. Williamson, T. F. Quatieri, B. S. Helfer, G. Ciccarelli, and D. D. Mehta, “Segment-dependent dynamics in predicting Parkinson’s disease,” Proceedings of InterSpeech, pp. 518-522, 2015. Paper
D. D. Mehta and P. J. Wolfe, “Statistical properties of linear prediction analysis underlying the challenge of formant bandwidth estimation,” The Journal of the Acoustical Society of America, vol. 137, no. 2, pp. 944-950, 2015. Publisher's Version Paper
D. D. Mehta, et al., “Using ambulatory voice monitoring to investigate common voice disorders: Research update,” Frontiers in Bioengineering and Biotechnology, vol. 3, no. 155, pp. 1-14, 2015. Publisher's VersionAbstract

Many common voice disorders are chronic or recurring conditions that are likely to result from inefficient and/or abusive patterns of vocal behavior, referred to as vocal hyperfunction. The clinical management of hyperfunctional voice disorders would be greatly enhanced by the ability to monitor and quantify detrimental vocal behaviors during an individual’s activities of daily life. This paper provides an update on ongoing work that uses a miniature accelerometer on the neck surface below the larynx to collect a large set of ambulatory data on patients with hyperfunctional voice disorders (before and after treatment) and matched-control subjects. Three types of analysis approaches are being employed in an effort to identify the best set of measures for differentiating among hyperfunctional and normal patterns of vocal behavior: (1) ambulatory measures of voice use that include vocal dose and voice quality correlates, (2) aerodynamic measures based on glottal airflow estimates extracted from the accelerometer signal using subject-specific vocal system models, and (3) classification based on machine learning and pattern recognition approaches that have been used successfully in analyzing long-term recordings of other physiological signals. Preliminary results demonstrate the potential for ambulatory voice monitoring to improve the diagnosis and treatment of common hyperfunctional voice disorders.

Y. - A. S. Lien, et al., “Voice relative fundamental frequency via neck-skin acceleration in individuals with voice disorders,” Journal of Speech, Language, and Hearing Research, vol. 58, no. 5, pp. 1482-1487, 2015. Publisher's VersionAbstract

Abstract Purpose: This study investigated the use of neck-skin acceleration for relative fundamental frequency (RFF) analysis. Method: Forty individuals with voice disorders associated with vocal hyperfunction and 20 age- and sex-matched control participants were recorded with a subglottal neck-surface accelerometer and a microphone while producing speech stimuli appropriate for RFF. Rater reliabilities, RFF means, and RFF standard deviations derived from the accelerometer were compared with those derived from the microphone. Results: RFF estimated from the accelerometer had slightly higher intrarater reliability and identical interrater reliability compared with values estimated with the microphone. Although sensor type and the Vocal Cycle × Sensor and Vocal Cycle × Sensor × Group interactions showed significant effects on RFF means, the typical RFF pattern could be derived from either sensor. For both sensors, the RFF of individuals with vocal hyperfunction was lower than that of the controls. Sensor type and its interactions did not have significant effects on RFF standard deviations. Conclusions: RFF can be reliably estimated using an accelerometer, but these values cannot be compared with those collected via microphone. Future studies are needed to determine the physiological basis of RFF and examine the effect of sensors on RFF in practical voice assessment and monitoring settings.

J. Guðnason, D. D. Mehta, and T. F. Quatieri, “Closed phase estimation for inverse filtering the oral airflow waveform,” Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 920-924, 2014.Abstract

Glottal closed phase estimation during speech production is critical to inverse filtering and, although addressed for radiated acoustic pressure analysis, must be better understood for the analysis of the oral airflow volume velocity signal that provides important properties of healthy and disordered voices. This paper compares the estimation of the closed phase from the acoustic speech signal and the oral airflow waveform recorded using a pneumotachograph mask. Results are presented for ten adult speakers with normal voices who sustained a set of vowels at a comfortable pitch and loudness. With electroglottography as reference, the identification rate and accuracy of glottal closure instants for the oral airflow are 96.8 % and 0.28 ms, whereas these metrics are 99.4 % and 0.10 ms for the acoustic signal. We conclude that glottal closure detection is adequate for close phase inverse filtering but that improvements to detection of glottal opening instants on the oral airflow signal are warranted.

R. E. Hillman, D. Mehta, J. H. Van Stan, M. Zañartu, M. Ghassemi, and J. V. Guttag, “Subglottal ambulatory monitoring of vocal function to improve voice disorder assessment,” The Journal of the Acoustical Society of America, vol. 136, pp. 2260-2260, 2014.
J. R. Williamson, T. F. Quatieri, B. S. Helfer, R. L. HORWITZ, B. Yu, and D. D. Mehta, “Vocal and facial biomarkers of depression based on motor incoordination,” Third International Audio/Visual Emotion Challenge (AVEC 2013), 21st ACM International Conference on Multimedia. pp. 1-4, 2013. Paper
M. Zañartu, J. C. Ho, D. D. Mehta, R. E. Hillman, and G. R. Wodicka, “Acoustic coupling during incomplete glottal closure and its effect on the inverse filtering of oral airflow,” Proceedings of Meetings on Acoustics, vol. 19, pp. 060241-7, 2013. Paper
M. Zañartu, J. C. Ho, D. D. Mehta, R. E. Hillman, and G. R. Wodicka, “Subglottal impedance-based inverse filtering of voiced sounds using neck surface acceleration,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, pp. 1929-1939, 2013.Abstract

A model-based inverse filtering scheme is proposed for an accurate, non-invasive estimation of the aerodynamic source of voiced sounds at the glottis. The approach, referred to as subglottal impedance-based inverse filtering (IBIF), takes as input the signal from a lightweight accelerometer placed on the skin over the extrathoracic trachea and yields estimates of glottal airflow and its time derivative, offering important advantages over traditional methods that deal with the supraglottal vocal tract. The proposed scheme is based on mechano-acoustic impedance representations from a physiologically-based transmission line model and a lumped skin surface representation. A subject-specific calibration protocol is used to account for individual adjustments of subglottal impedance parameters and mechanical properties of the skin. Preliminary results for sustained vowels with various voice qualities show that the subglottal IBIF scheme yields comparable estimates with respect to current aerodynamics-based methods of clinical vocal assessment. A mean absolute error of less than 10% was observed for two glottal airflow measures—maximum flow declination rate and amplitude of the modulation component—that have been associated with the pathophysiology of some common voice disorders caused by faulty and/or abusive patterns of vocal behavior (i.e., vocal hyperfunction). The proposed method further advances the ambulatory assessment of vocal function based on the neck acceleration signal, that previously have been limited to the estimation of phonation duration, loudness, and pitch. Subglottal IBIF is also suitable for other ambulatory applications in speech communication, in which further evaluation is underway.

D. D. Mehta, D. Rudoy, and P. J. Wolfe, “Kalman-based autoregressive moving average modeling and inference for formant and antiformant tracking,” The Journal of the Acoustical Society of America, vol. 132, no. 3, pp. 1732-1746, 2012. Publisher's Version Paper code
R. E. Hillman and D. D. Mehta, “Ambulatory monitoring of daily voice use,” Perspectives on Voice and Voice Disorders, vol. 21, no. 2, pp. 56-61, 2011. Publisher's Version Paper
D. D. Mehta, D. Rudoy, and P. J. Wolfe, “Joint source-filter modeling using flexible basis functions,” Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 5888-5891, 2011. Paper
M. Zañartu, D. D. Mehta, J. C. Ho, G. R. Wodicka, and R. E. Hillman, “Observation and analysis of in vivo vocal fold tissue instabilities produced by nonlinear source-filter coupling: A case study,” The Journal of the Acoustical Society of America, vol. 129, pp. 326-339, 2011. Paper
M. Zañartu, J. C. Ho, D. D. Mehta, R. E. Hillman, and G. R. Wodicka, “An impedance-based inverse filtering scheme with glottal coupling,” Proceedings of the Acoustical Society of America, 2009.
S. M. Lulich, M. Zañartu, D. D. Mehta, and R. E. Hillman, “Source-filter interaction in the opposite direction: Subglottal coupling and the influence of vocal fold mechanics on vowel spectra during the closed phase,” Proceedings of Meetings on Acoustics Acoustical Society of America, vol. 6, pp. 060007, 2009. Paper
S. M. Lulich, M. Zañartu, D. D. Mehta, and R. E. Hillman, “Source-filter interaction in the opposite direction: Subglottal coupling and the influence of vocal fold mechanics on vowel spectra during the closed phase,” Proceedings of the Acoustical Society of America. 2009.
D. Mehta and T. F. Quatieri, “Aspiration noise during phonation: Synthesis, analysis, and pitch-scale modification,” Massachusetts Institute of Technology, 2006.Abstract

The current study investigates the synthesis and analysis of aspiration noise in synthesized andspoken vowels. Based on the linear source-filter model of speech production, we implement a vowelsynthesizer in which the aspiration noise source is temporally modulated by the periodic sourcewaveform. Modulations in the noise source waveform and their synchrony with the periodic sourceare shown to be salient for natural-sounding vowel synthesis. After developing the synthesisframework, we research past approaches to separate the two additive components of the model. Achallenge for analysis based on this model is the accurate estimation of the aspiration noisecomponent that contains energy across the frequency spectrum and temporal characteristics due tomodulations in the noise source. Spectral harmonic/noise component analysis of spoken vowelsshows evidence of noise modulations with peaks in the estimated noise source componentsynchronous with both the open phase of the periodic source and with time instants of glottalclosure.Inspired by this observation of natural modulations in the aspiration noise source, we develop analternate approach to the speech signal processing aim of accurate pitch-scale modification. Theproposed strategy takes a dual processing approach, in which the periodic and noise components ofthe speech signal are separately analyzed, modified, and re-synthesized. The periodic component ismodified using our implementation of time-domain pitch-synchronous overlap-add, and the noisecomponent is handled by modifying characteristics of its source waveform. Since we have modeledan inherent coupling between the original periodic and aspiration noise sources, the modificationalgorithm is designed to preserve the synchrony between temporal modulations of the two sources.The reconstructed modified signal is perceived to be natural-sounding and generally reduces artifactsthat are typically heard in current modification techniques.

D. Mehta and T. F. Quatieri, “Pitch-scaled modification using the modulated aspiration noise source,” Proceedings of the International Conference on Spoken Language Processing, 2006. Paper
D. Mehta and T. F. Quatieri, “Synthesis, analysis, and pitch modification of the breathy vowel,” Paper presentation at the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 199-202, 2005. Paper