In vocally healthy children and adults, speaking voice loudness differences can significantly confound acoustic perturbation measurements. This study examines the effects of voice sound pressure level (SPL) on jitter, shimmer, and harmonics-to-noise ratio (HNR) in adults with voice disorders and a control group with normal vocal status.
This is a matched case-control study.
We assessed 58 adult female voice patients matched according to approximate age and occupation with 58 vocally healthy women. Diagnoses included vocal fold nodules (n = 39, 67.2%), polyps (n = 5, 8.6%), and muscle tension dysphonia (n = 14, 24.1%). All participants sustained the vowel /a/ at soft, comfortable, and loud phonation levels. Acoustic voice SPL, jitter, shimmer, and HNR were computed using Praat. The effects of loudness condition, voice SPL, pathology, differential diagnosis, age, and professional voice use level on acoustic perturbation measures were assessed using linear mixed models and Wilcoxon signed rank tests.
In both patient and normative control groups, increasing voice SPL correlated significantly (P < 0.001) with decreased jitter and shimmer, and increased HNR. Voice pathology and differential diagnosis were not linked to systematically higher jitter and shimmer. HNR levels, however, were statistically higher in the patient group than in the control group at comfortable phonation levels. Professional voice use level had a significant effect (P < 0.05) on jitter, shimmer, and HNR.
The clinical value of acoustic jitter, shimmer, and HNR may be limited if speaking voice SPL and professional voice use level effects are not controlled for. Future studies are warranted to investigate whether perturbation measures are useful clinical outcome metrics when controlling for these effects.
This study examined the relationship between the magnitude of neck-surface vibration (NSVMag; transduced with an accelerometer) and intraoral estimates of subglottal pressure (P'sg) during variations in vocal effort at 3 intensity levels.
Twelve vocally healthy adults produced strings of /pɑ/ syllables in 3 vocal intensity conditions, while increasing vocal effort during each condition. Measures were made of P'sg (estimated during stop-consonant closure), NSVMag (measured during the following vowel), sound pressure level, and respiratory kinematics. Mixed linear regression was used to analyze the relationship between NSVMag and P'sg with respect to total lung volume excursion, levels of lung volume initiation and termination, airflow, laryngeal resistance, and vocal efficiency across intensity conditions.
NSVMag was significantly related to P'sg (p < .001), and there was a significant, although small, interaction between NSVMag and intensity condition. Total lung excursion was the only additional variable contributing to predicting the NSVMag-P'sg relationship.
NSVMag closely reflects P'sg during variations of vocal effort; however, the relationship changes across different intensities in some individuals. Future research should explore additional NSV-based measures (e.g., glottal airflow features) to improve estimation accuracy during voice production.
Relative fundamental frequency (RFF) has shown promise as an acoustic measure of voice, but the subjective and time-consuming nature of its manual estimation has made clinical translation infeasible. Here, a faster, more objective algorithm for RFF estimation is evaluated in a large and diverse sample of individuals with and without voice disorders.
Acoustic recordings were collected from 154 individuals with voice disorders and 36 age- and sex-matched controls with typical voices. These recordings were split into training and 2 testing sets. Using an algorithm tuned to the training set, semi-automated RFF estimates in the testing sets were compared to manual RFF estimates derived from 3 trained technicians.
The semi-automated RFF estimations were highly correlated ( r = 0.82-0.91) with the manual RFF estimates.
Fast and more objective estimation of RFF makes large-scale RFF analysis feasible. This algorithm allows for future work to optimize RFF measures and expand their potential for clinical voice assessment.
Purpose The purpose of this article is to examine the ability of an acoustic measure, relative fundamental frequency (RFF), to distinguish between two subtypes of vocal hyperfunction (VH): phonotraumatic (PVH) and non-phonotraumatic (NPVH).
Method RFF values were compared among control individuals with typical voices (N = 49), individuals with PVH (N = 54), and individuals with NPVH (N = 35).
Results Offset Cycle 10 RFF differed significantly among all 3 groups with values progressively decreasing for controls, individuals with NPVH, and individuals with PVH. Individuals with PVH also had lower Offset Cycles 8 and 9 relative to the other 2 groups and lower RFF values for Offset Cycle 7 relative to controls. There was also a trend for lower Onset Cycle 1 RFF values for the PVH group compared with the NPVH group.
Conclusions RFF values were significantly different between controls and individuals with VH and also between the two subtypes of VH. This study adds further support to the notion that the differences between these two subsets of VH may be functional as well as structural.
This article provides a summary of some recent innovations in voice assessment expected to have an impact in the next 5–10 years on how patients with voice disorders are clinically managed by speech-language pathologists. Specific innovations discussed are in the areas of laryngeal imaging, ambulatory voice monitoring, and “big data” analysis using machine learning to produce new metrics for vocal health. Also discussed is the potential for using voice analysis to detect and monitor other health conditions.
This study analyzes signals recorded using a neck-surface accelerometer from subjects producing speech with different voice modes. The purpose is to explore if the recorded waveforms can capture the glottal vibratory patterns which can be related to the movement of the vocal folds and thus voice quality. The accelerometer waveforms do not contain the supraglottal resonances, and these characteristics make the proposed method suitable for real-life voice quality assessment and monitoring as it does not breach patient privacy. The experiments with a Gaussian mexture model classifier demonstrate that different voice qualities produce distinctly different accelerometer waveforms. The system achieved 80.2% and 89.5% for frame- and utterance-level accuracy, respectively, for classifying among modal, breathy, pressed, and rough voice modes using a speaker-dependent classifier. Finally, the article presents characteristic waveforms for each modality and discusses their attributes.
It has been proven that the improper function of the vocal folds can result in perceptually distorted speech that is typically identified with various speech pathologies or even some neurological diseases. As a consequence, researchers have focused on finding quantitative voice characteristics to objectively assess and automatically detect non-modal voice types. The bulk of the research has focused on classifying the speech modality by using the features extracted from the speech signal. This paper proposes a different approach that focuses on analyzing the signal characteristics of the electroglottogram (EGG) waveform. The core idea is that modal and different kinds of non-modal voice types produce EGG signals that have distinct spectral/cepstral characteristics. As a consequence, they can be distinguished from each other by using standard cepstral-based features and a simple multivariate Gaussian mixture model. The practical usability of this approach has been verified in the task of classifying among modal, breathy, rough, pressed and soft voice types. We have achieved 83% frame-level accuracy and 91% utterance-level accuracy by training a speaker-dependent system.
PurposeTo determine what research evidence exists to support the use of voice measures in the clinical assessment of patients with voice disorders. MethodThe American Speech-Language-Hearing Association (ASHA) National Center for Evidence-Based Practice in Communication Disorders staff searched 29 databases for peer-reviewed English-language articles between January 1930 and April 2009 that included key words pertaining to objective and subjective voice measures, voice disorders, and diagnostic accuracy. The identified articles were systematically assessed by an ASHA-appointed committee employing a modification of the critical appraisal of diagnostic evidence rating system. ResultsOne hundred articles met the search criteria. The majority of studies investigated acoustic measures (60%) and focused on how well a test method identified the presence or absence of a voice disorder (78%). Only 17 of the 100 articles were judged to contain adequate evidence for the measures studied to be formally considered for inclusion in clinical voice assessment. ConclusionResults provide evidence for selected acoustic, laryngeal imaging-based, auditory-perceptual, functional, and aerodynamic measures to be used as effective components in a clinical voice evaluation. However, there is clearly a pressing need for further high-quality research to produce sufficient evidence on which to recommend a comprehensive set of methods for a standard clinical voice evaluation.
In this article, we provide a brief summary of the major technological advances that led to current methods for imaging vocal fold vibration during phonation including the development of indirect laryngoscopy, imaging of rapid motion, fiber optics, and digital image capture. We also provide a brief overview of new emerging technologies that could be used in the future for voice research and clinical voice assessment, including advances in laryngeal high-speed videoendoscopy, depth-kymography, and dynamic optical coherence tomography.