In Press
J. H. Van Stan, D. D. Mehta, D. Sternad, R. Petit, and R. E. Hillman, “Ambulatory voice biofeedback: Relative frequency and summary feedback effects on performance and retention of reduced vocal intensity in the daily lives of participants with normal voices,” Journal of Speech, Language, Hearing Research, In Press. Publisher's VersionAbstract


Purpose Ambulatory voice biofeedback has the potential to significantly improve voice therapy effectiveness by targeting carryover of desired behaviors outside the therapy session (i.e., retention). This study applies motor learning concepts (reduced frequency and delayed, summary feedback) that demonstrate increased retention to ambulatory voice monitoring for training nurses to talk softer during work hours.

Method Forty-eight nurses with normal voices wore the Voice Health Monitor (Mehta, Zañartu, Feng, Cheyne, & Hillman, 2012) for 6 days: 3 baseline days, 1 biofeedback day, 1 short-term retention day, and 1 long-term retention day. Participants were block-randomized into 3 different biofeedback groups: 100%, 25%, and Summary. Performance was measured in terms of compliance time below a participant-specific vocal intensity threshold.

Results All participants exhibited a significant increase in compliance time (Cohen's d = 4.5) during biofeedback days compared with baseline days. The Summary feedback group exhibited statistically smaller performance reduction during both short-term (d = 1.14) and long-term (d = 1.04) retention days compared with the 100% feedback group.

Conclusions These findings suggest that modifications in feedback frequency and timing affect retention of a modified vocal behavior in daily life. Future work calls for studying the potential beneficial impact of ambulatory voice biofeedback in participants with behaviorally based voice disorders.


J. H. Van Stan, D. D. Mehta, and R. E. Hillman, “Recent innovations in voice assessment expected to impact the clinical management of voice disorders,” Perspectives of the ASHA Special Interest Groups, vol. 2, no. SIG 3, pp. 4-13, 2017. Publisher's VersionAbstract

This article provides a summary of some recent innovations in voice assessment expected to have an impact in the next 5–10 years on how patients with voice disorders are clinically managed by speech-language pathologists. Specific innovations discussed are in the areas of laryngeal imaging, ambulatory voice monitoring, and “big data” analysis using machine learning to produce new metrics for vocal health. Also discussed is the potential for using voice analysis to detect and monitor other health conditions.

M. Borsky, M. Cocude, D. D. Mehta, M. Zañartu, and J. Gudnason, “Classification of voice modes using neck-surface accelerometer data,” in International Conference on Acoustics, Speech, and Signal Processing, 2017.Abstract


This study analyzes signals recorded using a neck-surface accelerometer from subjects producing speech with different voice modes. The purpose is to explore if the recorded waveforms can capture the glottal vibratory patterns which can be related to the movement of the vocal folds and thus voice quality. The accelerometer waveforms do not contain the supraglottal resonances, and these characteristics make the proposed method suitable for real-life voice quality assessment and monitoring as it does not breach patient privacy. The experiments with a Gaussian mexture model classifier demonstrate that different voice qualities produce distinctly different accelerometer waveforms. The system achieved 80.2% and 89.5% for frame- and utterance-level accuracy, respectively, for classifying among modal, breathy, pressed, and rough voice modes using a speaker-dependent classifier. Finally, the article presents characteristic waveforms for each modality and discusses their attributes.


J. H. Van Stan, et al., “Integration of motor learning principles into real-time ambulatory voice biofeedback and example implementation via a clinical case study with vocal fold nodules,” American Journal of Speech-Language Pathology, vol. 26, no. 1, pp. 1-10, 2017. Publisher's VersionAbstract


Purpose Ambulatory voice biofeedback (AVB) has the potential to significantly improve voice therapy effectiveness by targeting one of the most challenging aspects of rehabilitation: carryover of desired behaviors outside of the therapy session. Although initial evidence indicates that AVB can alter vocal behavior in daily life, retention of the new behavior after biofeedback has not been demonstrated. Motor learning studies repeatedly have shown retention-related benefits when reducing feedback frequency or providing summary statistics. Therefore, novel AVB settings that are based on these concepts are developed and implemented.

Method The underlying theoretical framework and resultant implementation of innovative AVB settings on a smartphone-based voice monitor are described. A clinical case study demonstrates the functionality of the new relative frequency feedback capabilities.

Results With new technical capabilities, 2 aspects of feedback are directly modifiable for AVB: relative frequency and summary feedback. Although reduced-frequency AVB was associated with improved carryover of a therapeutic vocal behavior (i.e., reduced vocal intensity) in a patient post-excision of vocal fold nodules, causation cannot be assumed.

Conclusions Timing and frequency of AVB schedules can be manipulated to empirically assess generalization of motor learning principles to vocal behavior modification and test the clinical effectiveness of AVB with various feedback schedules.


M. Borsky, D. D. Mehta, J. P. Gudjohnsen, and J. Gudnason, “Classification of voice modality using electroglottogram waveforms,” in INTERSPEECH, 2016.Abstract


It has been proven that the improper function of the vocal folds can result in perceptually distorted speech that is typically identified with various speech pathologies or even some neurological diseases. As a consequence, researchers have focused on finding quantitative voice characteristics to objectively assess and automatically detect non-modal voice types. The bulk of the research has focused on classifying the speech modality by using the features extracted from the speech signal. This paper proposes a different approach that focuses on analyzing the signal characteristics of the electroglottogram (EGG) waveform. The core idea is that modal and different kinds of non-modal voice types produce EGG signals that have distinct spectral/cepstral characteristics. As a consequence, they can be distinguished from each other by using standard cepstral-based features and a simple multivariate Gaussian mixture model. The practical usability of this approach has been verified in the task of classifying among modal, breathy, rough, pressed and soft voice types. We have achieved 83% frame-level accuracy and 91% utterance-level accuracy by training a speaker-dependent system.


M. Maffei, J. H. Van Stan, R. E. Hillman, and D. D. Mehta, “Correlating ambulatory voice measures with vocal fatigue self-ratings in individuals with MTD and normal controls,” Proceedings of the Annual Convention of the American Speech-Language-Hearing Association, 2016. Poster
C. E. Stepp, M. Zañartu, D. D. Mehta, and R. E. Hillman, “Hyperfunctional voice disorders: Current results, clinical implications, and future directions of a multidisciplinary research program,” Proceedings of the Annual Convention of the American Speech-Language-Hearing Association, 2016.
V. McKenna, A. Llico, D. Mehta, and C. Stepp, “Neck-surface acceleration as an estimate of subglottal pressure during modulated vocal effort in healthy speakers,” Proceedings of the Annual Convention of the American Speech-Language-Hearing Association. 2016.
D. D. Mehta, H. A. Cheyne II, A. Wehner, J. T. Heaton, and R. E. Hillman, “Accuracy of self-reported estimates of daily voice use in adults with normal and disordered voices,” American Journal of Speech-Language Pathology, vol. 25, no. 4, pp. 576-589, 2016. Paper
M. Brockmann-Bauser, J. E. Bohlender, and D. D. Mehta, “Acoustic perturbation measures improve with increasing vocal intensity in healthy and pathological voices,” Proceedings of the Voice Foundation Symposium, 2016.
N. Iftimia, G. Maguluri, E. Chang, J. Park, J. Kobler, and D. Mehta, “Dynamic vocal fold imaging with combined optical coherence tomography/high-speed video endoscopy,” Proceedings of the 10th International Conference on Voice Physiology and Biomechanics, pp. 1-2, 2016. Paper
A. S. Fryd, J. H. Van Stan, R. E. Hillman, and D. D. Mehta, “Estimating subglottal pressure from neck-surface acceleration during normal voice production,” Journal of Speech, Language, and Hearing Research, vol. 59, no. 6, pp. 1335-1345, 2016. Publisher's VersionAbstract

Purpose The purpose of this study was to evaluate the potential for estimating subglottal air pressure using a neck-surface accelerometer and to compare the accuracy of predicting subglottal air pressure relative to predicting acoustic sound pressure level (SPL).

Method Indirect estimates of subglottal pressure (Psg′) were obtained from 10 vocally healthy speakers during loud-to-soft repetitions of 3 different /p/–vowel gestures (/pa/, /pi/, /pu/) at 3 pitch levels in the modal register. Intraoral air pressure, neck-surface acceleration, and radiated acoustic pressure were recorded, and the root-mean-square amplitude of the acceleration signal was correlated with Psg′ and SPL.

Results The coefficient of determination between accelerometer level and Psg′ was high when data were pooled from all vowel and pitch contexts for each participant (r 2 = .68–.93). These relationships were stronger than corresponding relationships between accelerometer level and SPL (r 2 = .46–.81). The average 95% prediction interval for estimating Psg′ using accelerometer level was ±2.53 cm H2O, ranging from ±1.70 to ±3.74 cm H2O across participants.

Conclusions Accelerometer signal amplitude correlated more strongly with Psg′ than with SPL. Future work is warranted to investigate the robustness of the relationship in nonmodal voice qualities, individuals with voice disorders, and accelerometer-based ambulatory monitoring of subglottal pressure.

O. Murton, et al., “Impact of congestive heart failure on voice and speech production: A pilot study,” Proceedings of the Annual Scientific Meeting of the Heart Failure Society of America, 2016. Poster
R. E. Hillman, D. Mehta, C. Stepp, J. Van Stan, and M. Zanartu, “Objective assessment of vocal hyperfunction,” Proceedings of The Journal of the Acoustical Society of America, vol. 139, pp. 2193-2194, 2016.
R. L. Horwitz-Martin, et al., “Relation of automatically extracted formant trajectories with intelligibility loss and speaking rate decline in amyotrophic lateral sclerosis,” Proceedings of InterSpeech, pp. 1205-1209, 2016. Paper
D. Mehta, J. Van Stan, and R. Hillman, “Relationships between vocal function measures derived from an acoustic microphone and a subglottal neck-surface accelerometer,” IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 24, no. 4, pp. 659-668, 2016. Publisher's VersionAbstract

Monitoring subglottal neck-surface acceleration has received renewed attention due to the ability of low-profile accelerometers to confidentially and noninvasively track properties related to normal and disordered voice characteristics and behavior. This study investigated the ability of subglottal necksurface acceleration to yield vocal function measures traditionally derived from the acoustic voice signal and help guide the development of clinically functional accelerometer-based measures from a physiological perspective. Results are reported for 82 adult speakers with voice disorders and 52 adult speakers with normal voices who produced the sustained vowels /A/, /i/, and /u/ at a comfortable pitch and loudness during the simultaneous recording of radiated acoustic pressure and subglottal necksurface acceleration. As expected, timing-related measures of jitter exhibited the strongest correlation between acoustic and necksurface acceleration waveforms (r 0:99), whereas amplitudebased measures of shimmer correlated less strongly (r 0:74). Additionally, weaker correlations were exhibited by spectral measures of harmonics-to-noise ratio (r 0:69) and tilt (r 0:57), whereas the cepstral peak prominence correlated more strongly (r 0:90). These empirical relationships provide evidence to support the use of accelerometers as effective complements to acoustic recordings in the assessment and monitoring of vocal function in the laboratory, clinic, and during an individual’s daily activities.

M. Ghassemi, Z. Syed, D. D. Mehta, J. H. Van Stan, R. E. Hillman, and J. V. Guttag, “Uncovering voice misuse using symbolic mismatch,” JMLR (Journal of Machine Learning Research): Workshop and Conference Proceedings, pp. 1-14, 2016. Paper
H. Aljehani, J. H. Van Stan, C. W. Haynes, and D. D. Mehta, “Ambulatory voice monitoring of a Muslim imam during Ramadan,” Proceedings of the Voice Foundation Symposium, 2015. Poster
J. H. Van Stan, D. D. Mehta, S. M. Zeitels, J. A. Burns, A. M. Barbu, and R. E. Hillman, “Average ambulatory measures of sound pressure level, fundamental frequency, and vocal dose do not differ between adult females with phonotraumatic lesions and matched control subjects,” Annals of Otology, Rhinology, and Laryngology, vol. 124, pp. 864-874, 2015.Abstract

Objectives: Clinical management of phonotraumatic vocal fold lesions (nodules, polyps) is based largely on assumptions that abnormalities in habitual levels of sound pressure level (SPL), fundamental frequency (f0), and/or amount of voice use play a major role in lesion development and chronic persistence. This study used ambulatory voice monitoring to evaluate if significant differences in voice use exist between patients with phonotraumatic lesions and normal matched controls.Methods: Subjects were 70 adult females: 35 with vocal fold nodules or polyps and 35 age-, sex-, and occupation-matched normal individuals. Weeklong summary statistics of voice use were computed from anterior neck surface acceleration recorded using a smartphone-based ambulatory voice monitor.Results: Paired t tests and Kolmogorov-Smirnov tests resulted in no statistically significant differences between patients and matched controls regarding average measures of SPL, f0, vocal dose measures, and voicing/voice rest periods. Paired t tests comparing f0 variability between the groups resulted in statistically significant differences with moderate effect sizes.Conclusions: Individuals with phonotraumatic lesions did not exhibit differences in average ambulatory measures of vocal behavior when compared with matched controls. More refined characterizations of underlying phonatory mechanisms and other potentially contributing causes are warranted to better understand risk factors associated with phonotraumatic lesions.

M. Ghassemi, et al., “Corrections to "Learning to detect vocal hyperfunction from ambulatory neck-surface acceleration features: Initial results for vocal fold nodules,” IEEE Transactions on Biomedical Engineering, vol. 62, no. 10, pp. 2544-2544, 2015. Publisher's VersionAbstract

In, the third sentence of the second paragraph in Section III-D should have read as follows: “We first divided data using leave-one-out cross validation (LOOCV) to generate 12 subject subsets, where each subject subset consisted of randomly selected data across the 12 pairs. For each test subset, all windows from the 11 other subsets were then subdivided using fivefold cross validation (1/5th validation and 4/5th training in each fold).”