Mosab Rezaei
2025
Detecting, Generating, and Evaluating in the Writing Style of Different Authors
Mosab Rezaei
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 4: Student Research Workshop)
In recent years, stylometry has been investigated in many different fields. Hence, in this work, we are going to tackle this problem, detecting, generating, and evaluating textual documents according to the writing style by leveraging state-of-the-art models. In the first step, the sentences will be extracted from several different books, each belonging to a different author, to create a dataset. Then the selected models will be trained to detect the author of sentences in the dataset. After that, generator models are utilized to generate sentences based on the authors’ writing styles with unpaired samples in the dataset. Finally, to evaluate the performance of the generators, the previously trained models will be used to assess the generated sentences and to compare the distribution of various syntactic features between the original and generated sentences. We hope the result shows that models can be achieved to detect and generate textual documents for the given authors according to their writing style.
2024
Text vs. Transcription: A Study of Differences Between the Writing and Speeches of U.S. Presidents
Mina Rajaei Moghadam
|
Mosab Rezaei
|
Gülşat Aygen
|
Reva Freedman
Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities
Even after many years of research, answering the question of the differences between spoken and written text remains open. This paper aims to study syntactic features that can serve as distinguishing factors. To do so, we focus on the transcribed speeches and written books of United States presidents. We conducted two experiments to analyze high-level syntactic features. In the first experiment, we examine these features while controlling for the effect of sentence length. In the second experiment, we compare the high-level syntactic features with low-level ones. The results indicate that adding high-level syntactic features enhances model performance, particularly in longer sentences. Moreover, the importance of the prepositional phrases in a sentence increases with sentence length. We also find that these longer sentences with more prepositional phrases are more likely to appear in speeches than in written books by U.S. presidents.