Abstract
Formality is one of the important characteristics of text documents. The automatic detection of the formality level of a text is potentially beneficial for various natural language processing tasks. Before, two large-scale datasets were introduced for multiple languages featuring formality annotation—GYAFC and X-FORMAL. However, they were primarily used for the training of style transfer models. At the same time, the detection of text formality on its own may also be a useful application. This work proposes the first to our knowledge systematic study of formality detection methods based on statistical, neural-based, and Transformer-based machine learning methods and delivers the best-performing models for public usage. We conducted three types of experiments – monolingual, multilingual, and cross-lingual. The study shows the overcome of Char BiLSTM model over Transformer-based ones for the monolingual and multilingual formality classification task, while Transformer-based classifiers are more stable to cross-lingual knowledge transfer.- Anthology ID:
- 2023.ranlp-1.31
- Volume:
- Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing
- Month:
- September
- Year:
- 2023
- Address:
- Varna, Bulgaria
- Editors:
- Ruslan Mitkov, Galia Angelova
- Venue:
- RANLP
- SIG:
- Publisher:
- INCOMA Ltd., Shoumen, Bulgaria
- Note:
- Pages:
- 274–284
- Language:
- URL:
- https://aclanthology.org/2023.ranlp-1.31
- DOI:
- Cite (ACL):
- Daryna Dementieva, Nikolay Babakov, and Alexander Panchenko. 2023. Detecting Text Formality: A Study of Text Classification Approaches. In Proceedings of the 14th International Conference on Recent Advances in Natural Language Processing, pages 274–284, Varna, Bulgaria. INCOMA Ltd., Shoumen, Bulgaria.
- Cite (Informal):
- Detecting Text Formality: A Study of Text Classification Approaches (Dementieva et al., RANLP 2023)
- PDF:
- https://preview.aclanthology.org/emnlp-22-attachments/2023.ranlp-1.31.pdf