Generative Adversarial Phonology: Modeling unsupervised allophonic learning with neural networks

begus_gap.pdf2.63 MB

Abstract:

This paper proposes that unsupervised phonetic and phonological learning of acoustic speech data can be modeled with Generative Adversarial Networks. Generative Adversarial Networks are uniquely appropriate for modeling phonetic and phonological learning because the network is trained on unannotated raw acoustic data, learning is unsupervised without any language-specific inputs, and the result is a network that learns to generate acoustic speech signal from random input variables. A GAN model for acoustic data proposed by Donahue et al. (2019) was trained on an allophonic alternation in English, where voiceless stops surface as aspirated word-initially before stressed vowels except if followed by a sibilant  [s]. The corresponding sequences of word-initial voiceless stops with and without the preceding [s]   from the TIMIT database were used in training. Measurements of VOT of stops produced by the Generator network was used as a test of learning. The model successfully  learned the allophonic alternation without any language-specific input: the generated speech signal contains the conditional distribution of VOT duration. The results demonstrate that Generative Adversarial Networks bear potential for modeling phonetic and phonological learning as they can successfully learn to generate  allophonic distribution from only acoustic inputs without any language-specific features in the model. The paper also discusses how the model's architecture can resemble linguistic behavior in language acquisition.