Conference Papers

D. D. Mehta, P. C. Chwalek, T. F. Quatieri, and L. J. Brattain, “Wireless neck-surface accelerometer and microphone on flex circuit with application to noise-robust monitoring of Lombard speech,” in Interspeech, 2017.Abstract
Ambulatory monitoring of real-world voice characteristics and behavior has the potential to provide important assessment of voice and speech disorders and psychological and emotional state. In this paper, we report on the novel development of a lightweight, wireless voice monitor that synchronously records dual-channel data from an acoustic microphone and a neck-surface accelerometer embedded on a flex circuit. In this paper, Lombard speech effects were investigated in pilot data from four adult speakers with normal vocal function who read a phonetically balanced paragraph in the presence of different ambient acoustic noise levels. Whereas the signal-to-noise ratio (SNR) of the microphone signal decreased in the presence of increasing ambient noise level, the SNR of the accelerometer sensor remained high. Lombard speech properties were thus robustly computed from the accelerometer signal and observed in all four speakers who exhibited increases in average estimates of sound pressure level (+2.3 dB), fundamental frequency (+21.4 Hz), and cepstral peak prominence (+1.3 dB) from quiet to loud ambient conditions. Future work calls for ambulatory data collection in naturalistic environments, where the microphone acts as a sound level meter and the accelerometer functions as a noise-robust voicing sensor to assess voice disorders, neurological conditions, and cognitive load.
Paper Poster
M. Borsky, M. Cocude, D. D. Mehta, M. Zañartu, and J. Gudnason, “Classification of voice modes using neck-surface accelerometer data,” in International Conference on Acoustics, Speech, and Signal Processing, 2017.Abstract


This study analyzes signals recorded using a neck-surface accelerometer from subjects producing speech with different voice modes. The purpose is to explore if the recorded waveforms can capture the glottal vibratory patterns which can be related to the movement of the vocal folds and thus voice quality. The accelerometer waveforms do not contain the supraglottal resonances, and these characteristics make the proposed method suitable for real-life voice quality assessment and monitoring as it does not breach patient privacy. The experiments with a Gaussian mexture model classifier demonstrate that different voice qualities produce distinctly different accelerometer waveforms. The system achieved 80.2% and 89.5% for frame- and utterance-level accuracy, respectively, for classifying among modal, breathy, pressed, and rough voice modes using a speaker-dependent classifier. Finally, the article presents characteristic waveforms for each modality and discusses their attributes.


M. Borsky, D. D. Mehta, J. P. Gudjohnsen, and J. Gudnason, “Classification of voice modality using electroglottogram waveforms,” in INTERSPEECH, 2016.Abstract


It has been proven that the improper function of the vocal folds can result in perceptually distorted speech that is typically identified with various speech pathologies or even some neurological diseases. As a consequence, researchers have focused on finding quantitative voice characteristics to objectively assess and automatically detect non-modal voice types. The bulk of the research has focused on classifying the speech modality by using the features extracted from the speech signal. This paper proposes a different approach that focuses on analyzing the signal characteristics of the electroglottogram (EGG) waveform. The core idea is that modal and different kinds of non-modal voice types produce EGG signals that have distinct spectral/cepstral characteristics. As a consequence, they can be distinguished from each other by using standard cepstral-based features and a simple multivariate Gaussian mixture model. The practical usability of this approach has been verified in the task of classifying among modal, breathy, rough, pressed and soft voice types. We have achieved 83% frame-level accuracy and 91% utterance-level accuracy by training a speaker-dependent system.


N. Iftimia, G. Maguluri, E. Chang, J. Park, J. Kobler, and D. Mehta, “Dynamic vocal fold imaging with combined optical coherence tomography/high-speed video endoscopy,” Proceedings of the 10th International Conference on Voice Physiology and Biomechanics, pp. 1-2, 2016. Paper
R. L. Horwitz-Martin, et al., “Relation of automatically extracted formant trajectories with intelligibility loss and speaking rate decline in amyotrophic lateral sclerosis,” Proceedings of InterSpeech, pp. 1205-1209, 2016. Paper
G. Maguluri, E. Chang, N. Iftimia, D. Mehta, and J. Kobler, “Dynamic vocal fold imaging by integrating optical coherence tomography with laryngeal high-speed video endoscopy,” Proceedings of the Conference on Lasers and Electro-Optics (CLEO), pp. 1-2, 2015.Abstract

We demonstrate three-dimensional vocal fold imaging during phonation by integrating optical coherence tomography with high-speed videoendoscopy. Results from ex vivo larynx experiments yield reconstructed vocal fold surface contours for ten phases of periodic motion.

Jón Guðnason, D. D. Mehta, and T. F. Quatieri, “Evaluation of speech inverse filtering techniques using a physiologically-based synthesizer,” Proceedings of the International Conference on Acoustics, Speech, and Signal Processing, 2015. Paper
J. R. Williamson, T. F. Quatieri, B. S. Helfer, G. Ciccarelli, and D. D. Mehta, “Segment-dependent dynamics in predicting Parkinson’s disease,” Proceedings of InterSpeech, pp. 518-522, 2015. Paper
T. F. Quatieri, et al., “Vocal biomarkers to discriminate cognitive load in a working memory task,” Proceedings of InterSpeech, pp. 2684-2688, 2015. Paper
J. Guðnason, D. D. Mehta, and T. F. Quatieri, “Closed phase estimation for inverse filtering the oral airflow waveform,” Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 920-924, 2014.Abstract

Glottal closed phase estimation during speech production is critical to inverse filtering and, although addressed for radiated acoustic pressure analysis, must be better understood for the analysis of the oral airflow volume velocity signal that provides important properties of healthy and disordered voices. This paper compares the estimation of the closed phase from the acoustic speech signal and the oral airflow waveform recorded using a pneumotachograph mask. Results are presented for ten adult speakers with normal voices who sustained a set of vowels at a comfortable pitch and loudness. With electroglottography as reference, the identification rate and accuracy of glottal closure instants for the oral airflow are 96.8 % and 0.28 ms, whereas these metrics are 99.4 % and 0.10 ms for the acoustic signal. We conclude that glottal closure detection is adequate for close phase inverse filtering but that improvements to detection of glottal opening instants on the oral airflow signal are warranted.

J. R. Williamson, T. F. Quatieri, B. S. Helfer, G. Ciccarelli, and D. D. Mehta, “Vocal and facial biomarkers of depression based on motor incoordination and timing,” Proceedings of the Fourth International Audio/Visual Emotion Challenge (AVEC 2014), 22nd ACM International Conference on Multimedia, pp. 65-72, 2014. Paper
J. R. Williamson, T. F. Quatieri, B. S. Helfer, R. L. HORWITZ, B. Yu, and D. D. Mehta, “Vocal and facial biomarkers of depression based on motor incoordination,” Third International Audio/Visual Emotion Challenge (AVEC 2013), 21st ACM International Conference on Multimedia. pp. 1-4, 2013. Paper
M. Zañartu, J. C. Ho, D. D. Mehta, R. E. Hillman, and G. R. Wodicka, “Acoustic coupling during incomplete glottal closure and its effect on the inverse filtering of oral airflow,” Proceedings of Meetings on Acoustics, vol. 19, pp. 060241-7, 2013. Paper
R. E. Hillman, et al., “Future directions in the development of ambulatory monitoring for clinical voice assessment,” Proceedings of the 10th International Conference on Advances in Quantitative Laryngology, Voice and Speech Research, 2013.
D. D. Mehta, et al., “High-speed videomicroscopy and acoustic analysis of ex vivo vocal fold vibratory asymmetry,” Proceedings of the 10th International Conference on Advances in Quantitative Laryngology, Voice and Speech Research, 2013. Paper
D. D. Mehta, M. Zañartu, J. H. Van Stan, S. W. Feng, H. A. Cheyne II, and R. E. Hillman, “Smartphone-based detection of voice disorders by long-term monitoring of neck acceleration features,” Proceedings of the IEEE International Conference on Body Sensor Networks, pp. 1-6, 2013. Paper
M. Zañartu, et al., “Toward an objective aerodynamic assessment of vocal hyperfunction using a voice health monitor,” Proceedings of the 8th International Workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, 2013. Paper
D. D. Mehta, et al., “Duration of ambulatory monitoring needed to accurately estimate voice use,” Proceedings of InterSpeech: Annual Conference of the International Speech Communication Association, 2012. Paper Poster
D. D. Mehta, D. Rudoy, and P. J. Wolfe, “Joint source-filter modeling using flexible basis functions,” Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, pp. 5888-5891, 2011. Paper
S. M. Lulich, M. Zañartu, D. D. Mehta, and R. E. Hillman, “Source-filter interaction in the opposite direction: Subglottal coupling and the influence of vocal fold mechanics on vowel spectra during the closed phase,” Proceedings of Meetings on Acoustics Acoustical Society of America, vol. 6, pp. 060007, 2009. Paper