Automatic Authorship Investigation

Publication year
2022
Pages
219-255
Comment

special attention will be given to the 49 texts published by Mills & Boon (henceforth M&B), a publisher specialising in Romance Fiction. The restriction to this specific set of texts is an attempt to control for text type and general topic, and hence to have a clearer measurement of author-related language use preferences. Within the M&B texts, the author of each text is known and there is one author with three texts, three with two, and forty with only one text. Our running example will focus on verification of authorship by Stephanie Howard, the author of three texts in our set, as we will do with complete systems in Section 4. (225)

---

we should strive to avoid any biases in our data, so that we can assume that the systems are indeed discovering author-related language use features. In this investigation, we do this by selecting only texts from a single text type and genre, namely romance fiction books published by the British publisher Mills & Boon in the 1990s, as present in the British National Corpus. (244-245)

---

All selected methods are able to distinguish quite well between samples from Howard’s books and samples from other authors’ books. For the traditional methods, given access to the largest feature sets, separation of the two classes is even perfect. (249)