Computer scientists have developed an algorithm which can predict with 84 per cent accuracy whether a book will be a commercial success – and the secret is to avoid cliches and excessive use of verbs
Scientists have developed an algorithm which can analyse a book and predict with 84 per cent accuracy whether or not it will be a commercial success.
A technique called statistical stylometry, which mathematically examines the use of words and grammar, was found to be “surprisingly effective” in determining how popular a book would be.
The group of computer scientists from Stony Brook University in New York said that a range of factors determine whether or not a book will enjoy success, including “interestingness”, novelty, style of writing, and how engaging the storyline is, but admit that external factors such as luck can also play a role.
By downloading classic books from the Project Gutenberg archive they were able to analyse texts with their algorithm and compare its predictions to historical information on the success of the work. Everything from science fiction to classic literature and poetry was included.
It was found that the predictions matched the actual popularity of the book 84 per cent of the time.
They found several trends that were often found in successful books, including heavy use of conjunctions such as “and” and “but” and large numbers of nouns and adjectives.
Less successful work tended to include more verbs and adverbs and relied on words that explicitly describe actions and emotions such as “wanted”, “took” or “promised”, while more successful books favoured verbs that describe thought processes such as “recognised” or “remembered”.
To find “less successful” books for their tests, the researchers scoured Amazon for low-ranking books in terms of sales. They also included Dan Brown’s The Lost Symbol, despite its commercial success, because of “negative critiques if had attracted from media”.
“Predicting the success of literary works poses a massive dilemma for publishers and aspiring writers alike,” said Assistant Professor Yejin Choi, one of the authors of the paper published by the Association of Computational Linguistics.
“To the best of our knowledge, our work is the first that provides quantitative insights into the connection between the writing style and the success of literary works.
“Previous work has attempted to gain insights into the ‘secret recipe’ of successful books. But most of these studies were qualitative, based on a dozen books, and focused primarily on high-level content – the personalities of protagonists and antagonists and the plots. Our work examines a considerably larger collection – 800 books – over multiple genres, providing insights into lexical, syntactic, and discourse patterns that characterise the writing styles commonly shared among the successful literature.”