Voice disorders affect a large number of adults in the United States, and their clinical evaluation heavily relies on laryngeal videostroboscopy, which captures the medial-lateral and anterior-posterior motion of the vocal folds using stroboscopic sampling. However, videostroboscopy does not provide direct visualization of the superior-inferior movement of the vocal folds, which yields important clinical insight. In this paper, we present a novel technology that complements videostroboscopic findings by adding the ability to image the coronal plane and visualize the superior-inferior movement of the vocal folds. The technology is based on optical coherence tomography, which is combined with videostroboscopy within the same endoscopic probe to provide spatially and temporally co-registered images of the mucosal wave motion, as well as vocal folds subsurface morphology. We demonstrate the capability of the rigid endoscopic probe, in a benchtop setting, to characterize the complex movement and subsurface structure of the aerodynamically driven excised larynx models within the 50 to 200 Hz phonation range. Our preliminary results encourage future development of this technology with the goal of its use for in vivo laryngeal imaging.
Excessive vocal fold collision pressures during phonation are considered to play a primary role in the formation of benign vocal fold lesions, such as nodules. The ability to accurately and reliably acquire intraglottal pressure has the potential to provide unique insights into the pathophysiology of phonotrauma. Difficulties arise, however, in directly measuring vocal fold contact pressures due to physical intrusion from the sensor that may disrupt the contact mechanics, as well as difficulty in determining probe/sensor position relative to the contact location. These issues are quantified and addressed through the implementation of a novel approach for identifying the timing and location of vocal fold contact, and measuring intraglottal and vocal fold contact pressures via a pressure probe embedded in the wall of a hemi-laryngeal flow facility. The accuracy and sensitivity of the pressure measurements are validated against ground truth values. Application to in vivo approaches are assessed by acquiring intraglottal and VF contact pressures using a synthetic, self-oscillating vocal fold model in a hemi-laryngeal configuration, where the sensitivity of the measured intraglottal and vocal fold contact pressure relative to the sensor position is explored.
The purpose of this study was to evaluate the effects of nonmodal phonation on estimates of subglottal pressure (Ps) derived from the magnitude of a neck-surface accelerometer (ACC) signal and to confirm previous findings regarding the impact of vowel contexts and pitch levels in a larger cohort of participants.
Twenty-six vocally healthy participants (18 women, 8 men) were asked to produce a series of p-vowel syllables with descending loudness in 3 vowel contexts (/a/, /i/, and /u/), 3 pitch levels (comfortable, high, and low), and 4 elicited phonatory conditions (modal, breathy, strained, and rough). Estimates of Ps for each vowel segment were obtained by averaging the intraoral air pressure plateau before and after each segment. The root-mean-square magnitude of the neck-surface ACC signal was computed for each vowel segment. Three linear mixed-effects models were used to statistically assess the effects of vowel, pitch, and phonatory condition on the linear relationship (slope and intercept) between Ps and ACC signal magnitude.
Results demonstrated statistically significant linear relationships between ACC signal magnitude and Ps within participants but with increased intercepts for the nonmodal phonatory conditions; slopes were affected to a lesser extent. Vowel and pitch contexts did not significantly affect the linear relationship between ACC signal magnitude and Ps.
The classic linear relationship between ACC signal magnitude and Ps is significantly affected when nonmodal phonation is produced by a speaker. Future work is warranted to further characterize nonmodal phonatory characteristics to improve the ACC-based prediction of Ps during naturalistic speech production.
Ambulatory voice monitoring is a promising tool for investigating phonotraumatic vocal hyperfunction (PVH), associated with the development of vocal fold lesions. Since many patients with PVH are professional vocalists, a classifier was developed to better understand phonatory mechanisms during speech and singing. Twenty singers with PVH and 20 matched healthy controls were monitored with a neck-surface accelerometer–based ambulatory voice monitor. An expert-labeled ground truth data set was used to train a logistic regression on 15 subject-pairs with fundamental frequency and autocorrelation peak amplitude as input features. Overall classification accuracy of 94.2% was achieved on the held-out test set.
Irregular pitch periods (IPPs) are associated with grammatically, pragmatically, and clinically significant types of nonmodal phonation, but are challenging to identify. Automatic detection of IPPs is desirable because accurately hand-identifying IPPs is time-consuming and requires training. The authors evaluated an algorithm developed for creaky voice analysis to automatically identify IPPs in recordings of American English conversational speech. To determine a perceptually relevant threshold probability, frame-by-frame creak probabilities were compared to hand labels, yielding a threshold of approximately 0.02. These results indicate a generally good agreement between hand-labeled IPPs and automatic detection, calling for future work investigating effects of linguistic and prosodic context.
Miniature high-bandwidth accelerometers on the anterior neck surface are used in laboratory and ambulatory settings to obtain vocal function measures. This study compared the widely applied L1–L2 measure (historically, H1–H2)—the difference between the log-magnitude of the first and second harmonics—computed from the glottal airflow waveform with L1–L2 derived from the raw neck-surface acceleration signal in 79 vocally healthy female speakers. Results showed a significant correlation (r = 0.72) between L1–L2 values estimated from both airflow and accelerometer signals, suggesting that raw accelerometer-based estimates of L1–L2 may be interpreted as reflecting glottal physiological parameters and voice quality attributes during phonation.
Purpose: Prior investigations suggest that simultaneous performance of more than 1 motor-oriented task may exacerbate speech motor deficits in individuals with Parkinson disease (PD). The purpose of the current investigation was to examine the extent to which performing a low-demand manual task affected the connected speech in individuals with and without PD. Method: Individuals with PD and neurologically healthy controls performed speech tasks (reading and extemporaneous speech tasks) and an oscillatory manual task (a counterclockwise circle-drawing task) in isolation (single-task condition) and concurrently (dual-task condition). Results: Relative to speech task performance, no changes in speech acoustics were observed for either group when the low-demand motor task was performed with the concurrent reading tasks. Speakers with PD exhibited a significant decrease in pause duration between the single-task (speech only) and dual-task conditions for the extemporaneous speech task, whereas control participants did not exhibit changes in any speech production variable between the single- and dual-task conditions. Conclusions: Overall, there were little to no changes in speech production when a low-demand oscillatory motor task was performed with concurrent reading. For the extemporaneous task, however, individuals with PD exhibited significant changes when the speech and manual tasks were performed concurrently, a pattern that was not observed for control speakers. Supplemental Material: https://doi.org/10.23641/asha. 8637008
Purpose: The purpose of the current study was to characterize clear speech production for speakers with and without Parkinson disease (PD) using several measures of working vowel space computed from frequently sampled formant trajectories. Method: The 1st 2 formant frequencies were tracked for a reading passage that was produced using habitual and clear speaking styles by 15 speakers with PD and 15 healthy control speakers. Vowel space metrics were calculated from the distribution of frequently sampled formant frequency tracks, including vowel space hull area, articulatory–acoustic vowel space, and multiple vowel space density (VSD) measures based on different percentile contours of the formant density distribution. Results: Both speaker groups exhibited significant increases in the articulatory–acoustic vowel space and VSD10, the area of the outermost (10th percentile) contour of the formant density distribution, from habitual to clear styles. These clarity-related vowel space increases were significantly smaller for speakers with PD than controls. Both groups also exhibited a significant increase in vowel space hull area; however, this metric was not sensitive to differences in the clear speech response between groups. Relative to healthy controls, speakers with PD exhibited a significantly smaller VSD90, the area of the most central (90th percentile), densely populated region of the formant space. Conclusions: Using vowel space metrics calculated from formant traces of the reading passage, the current work suggests that speakers with PD do indeed reach the more peripheral regions of the vowel space during connected speech but spend a larger percentage of the time in more central regions of formant space than healthy speakers. Additionally, working vowel space metrics based on the distribution of formant data suggested that speakers with PD exhibited less of a clarity-related increase in formant space than controls, a trend that was not observed for perimeter-based measures of vowel space area.