Syntax and Themes: How Context Free Grammar Rules and Semantic Word Association Influence Book Success

Henry Gorelick, Biddut Sarker Bijoy, Syeda Jannatus Saba, Sudipta Kar, Md Saiful Islam, Mohammad Ruhul Amin


Abstract
In this paper, we attempt to improve upon the state-of-the-art in predicting a novel’s success by modeling the lexical semantic relationships of its contents. We created the largest dataset used in such a project containing lexical data from 17,962 books from Project Gutenberg. We utilized domain specific feature reduction techniques to implement the most accurate models to date for predicting book success, with our best model achieving an average accuracy of 94.0%. By analyzing the model parameters, we extracted the successful semantic relationships from books of 12 different genres. We finally mapped those semantic relations to a set of themes, as defined in Roget’s Thesaurus and discovered the themes that successful books of a given genre prioritize. At the end of the paper, we further showed that our model demonstrate similar performance for book success prediction even when Goodreads rating was used instead of download count to measure success.
Anthology ID:
2021.ranlp-1.53
Volume:
Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021)
Month:
September
Year:
2021
Address:
Held Online
Venue:
RANLP
SIG:
Publisher:
INCOMA Ltd.
Note:
Pages:
463–474
Language:
URL:
https://aclanthology.org/2021.ranlp-1.53
DOI:
Bibkey:
Cite (ACL):
Henry Gorelick, Biddut Sarker Bijoy, Syeda Jannatus Saba, Sudipta Kar, Md Saiful Islam, and Mohammad Ruhul Amin. 2021. Syntax and Themes: How Context Free Grammar Rules and Semantic Word Association Influence Book Success. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2021), pages 463–474, Held Online. INCOMA Ltd..
Cite (Informal):
Syntax and Themes: How Context Free Grammar Rules and Semantic Word Association Influence Book Success (Gorelick et al., RANLP 2021)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/2021.ranlp-1.53.pdf