Could a computer program predict if a proposed book by an author will become a best-seller? Researchers Vikas Ganjigunte Asho, Song Fen, and Yejin Choi, from the College of Engineering and Applied Sciences and the Computer Science Department of Stony Brook University in New York, believe they succeeded by using a statistical stylometry computer program they created. Of course, book publishers do this work (at least they try), but it is not easy for them to predict the success of a book when they must evaluate thousands of proposals. What would happen to the world of publishing if a digital algorithm was more effective and more accurate than a publisher? The program could also be useful for writers: It would allow them to assess the potential of their work. The stylometry technique is also used to determine if a literary work has been plagiarized.The program is based on the analysis of approximately 800 novels, from a library of over 42,000 free books, from the Gutenberg Project. The books chosen were analyzed according to their literary success: the prizes they earned and literary critics they received.
The Russian engineer and scientist Genrich Altshuller developed, in 1946, an algorithm called TRIZ, which is an acronym for Theory of Inventive Problem Solving (Teorija Reshenija Izobretateliskih Zadatchen). He analyzed 40,000 patents, selected from 400,000 patents throughout the world.
In his analysis of the selected patents, Altshuller noticed that they shared common principles of innovation. He also noted that the problems encountered during the design of new products showed some analogies with others and that similar solutions should be applicable. The analysis of these 40,000 patents allowed him to develop the TRIZ theory.The researchers who developed the algorithm for statistical stylometry made a statistical analysis of 800 selected novels to discover the common principles associated with their popularity, in a manner similar to the TRIZ theory developed by Altshuller. Some principles of their analysis were:
- Prepositions, nouns, pronouns, articles, and adjectives are predictive of highly successful books;
- Less successful books are characterized by a higher percentage of verbs, adverbs, and foreign words. They also rely more on fad words considered as clichés (love), platitudes, overstatements (exhausted), and negative words (bruised);
- The least popular books described mainly actions and emotions and, conversely , the most popular used a vocabulary associated with reflection, thought and memories;
- The more dense and complex the novel, the more likely it will stand out.