SCIENTIFIC NEWS AND
INNOVATION FROM ÉTS
Triz – to Predict the Success of Literary Works - By : Substance,

Triz – to Predict the Success of Literary Works


triz

Header picture from Ronnie Pitman, CC licence, source.

Could a computer program predict if a proposed book by an author will become a best-seller? Researchers Vikas Ganjigunte Asho, Song Fen, and Yejin Choi, from the College of Engineering and Applied Sciences and the Computer Science Department of Stony Brook University in New York, believe they succeeded by using a statistical stylometry computer program they created. Of course, book publishers do this work (at least they try), but it is not easy for them to predict the success of a book when they must evaluate thousands of proposals. What would happen to the world of publishing if a digital algorithm was more effective and more accurate than a publisher? The program could also be useful for writers: It would allow them to assess the potential of their work. The stylometry technique is also used to determine if a literary work has been plagiarized.

triz

The figure summarizes the suitability of different plagiarism detection approaches depending on the form of plagiarism being present. Source [Img1].

The program is based on the analysis of approximately 800 novels, from a library of over 42,000 free books, from the Gutenberg Project. The books chosen were analyzed according to their literary success: the prizes they earned and literary critics they received.

The Russian engineer and scientist Genrich Altshuller developed, in 1946, an algorithm called TRIZ, which is an acronym for Theory of Inventive Problem Solving (Teorija Reshenija Izobretateliskih Zadatchen). He analyzed 40,000 patents, selected from 400,000 patents throughout the world.

In his analysis of the selected patents, Altshuller noticed that they shared common principles of innovation.  He also noted that the problems encountered during the design of new products showed some analogies with others and that similar solutions should be applicable. The analysis of these 40,000 patents allowed him to develop the TRIZ theory.

triz

TRIZ process for creative problem solving. Source [Img2].

The researchers who developed the algorithm for statistical stylometry made a statistical analysis of 800 selected novels to discover the common principles associated with their popularity, in a manner similar to the TRIZ theory developed by Altshuller. Some principles of their analysis were:

    • Prepositions, nouns, pronouns, articles, and adjectives are predictive of highly successful books;
    • Less successful books are characterized by a higher percentage of verbs, adverbs, and foreign words. They also rely more on fad words considered as clichés (love), platitudes, overstatements (exhausted), and negative words (bruised);
    • The least popular books described mainly actions and emotions and, conversely , the most popular used a vocabulary associated with reflection, thought and memories;
    • The more dense and complex the novel, the more likely it will stand out.

triz

Dr. Choi and her colleagues from the College of Engineering and Applied Sciences—Vikas Ashok, teaching assistant, and Song Feng, fifth year PhD student, both from the Department of Computer Science. Source [Img3].

For more information on this algorithm, please see the following article in PDF format: Ashok, V.G., S. Feng and Y. Choi. Success with Style: Using Writing Style to Predict the Success of Novels. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1753–1764, Seattle, Washington, USA, 18-21 October 2013.