This essay is about the relationship between prizewinning novels and their economic counterparts, bestsellers. It is about the ways in which social distinction is symbolically manifested within the contemporary novel and how we read social difference through language.
---
we are interested in understanding the ways in which "bestsellers" and "prizewinners" cohere as categories and the extent to which this coherence is based on meaningful, and meaningfully distinguishing, textual features.
---
we use machine learning and statistical modeling to better understand the linguistic differences of these categories, especially as they relate to the category of time. Rather than rely on the static construct of the novel's "setting" as the analytic lens, we try to understand the temporal patterns that run through these works.
---
Best-sellers tend on average to be the longest genre, while romances are the shortest. Prizewinners have the highest vocabulary richness (ratio of unique words to total words), while romances have the lowest.
---
romance not surprisingly looks the most distinctive when compared to other types of writing, especially science fiction
---
we used the process of machine learning to build a classifier whose aim is to predict which category a given novel belongs to. Based on novels the learning algorithm has seen, how well can it properly predict novels it has not seen? As Table 2 shows, the strongest distinctions exist between prizewinning novels and romances. Just about 99 times out of 100, the computer can correctly predict whether a novel is either a romance or has been shortlisted for a literary prize, which suggests that if you want to win a literary prize, writing a romance, or anything resembling one, is a very bad idea.
---
---
---
---
---