Unsupervised Decomposition of a Document into Authorial Components


Moshe Koppel, Navot AkivaI, Idan Dershowitz, and Nachum Dershowitz. 2011. “Unsupervised Decomposition of a Document into Authorial Components.” Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics (2011): 1356–1364, Pp. 1356–1364. Direct Link


We propose a novel unsupervised method for separating out distinct authorial components of a document. In particular, we show that, given a book artificially “munged” from two thematically similar biblical books, we can separate out the two constituent books almost perfectly. This allows us to automatically recapitulate many conclusions reached by Bible scholars over centuries of research. One of the key elements of our method is exploitation of differences in synonym choice by different authors.
