Aspiration noise during phonation: Synthesis, analysis, and pitch-scale modification


D. Mehta and T. F. Quatieri, “Aspiration noise during phonation: Synthesis, analysis, and pitch-scale modification,” Massachusetts Institute of Technology, 2006.
Thesis3.65 MB

Thesis Type:

Master's dissertation


The current study investigates the synthesis and analysis of aspiration noise in synthesized andspoken vowels. Based on the linear source-filter model of speech production, we implement a vowelsynthesizer in which the aspiration noise source is temporally modulated by the periodic sourcewaveform. Modulations in the noise source waveform and their synchrony with the periodic sourceare shown to be salient for natural-sounding vowel synthesis. After developing the synthesisframework, we research past approaches to separate the two additive components of the model. Achallenge for analysis based on this model is the accurate estimation of the aspiration noisecomponent that contains energy across the frequency spectrum and temporal characteristics due tomodulations in the noise source. Spectral harmonic/noise component analysis of spoken vowelsshows evidence of noise modulations with peaks in the estimated noise source componentsynchronous with both the open phase of the periodic source and with time instants of glottalclosure.Inspired by this observation of natural modulations in the aspiration noise source, we develop analternate approach to the speech signal processing aim of accurate pitch-scale modification. Theproposed strategy takes a dual processing approach, in which the periodic and noise components ofthe speech signal are separately analyzed, modified, and re-synthesized. The periodic component ismodified using our implementation of time-domain pitch-synchronous overlap-add, and the noisecomponent is handled by modifying characteristics of its source waveform. Since we have modeledan inherent coupling between the original periodic and aspiration noise sources, the modificationalgorithm is designed to preserve the synchrony between temporal modulations of the two sources.The reconstructed modified signal is perceived to be natural-sounding and generally reduces artifactsthat are typically heard in current modification techniques.

Last updated on 10/20/2016