ByteBreaker@DravidianLangTech 2026: XLM-RoBERTa Large with Sliding-Window Chunking and Top-K Mean Pooling for Writing Style Classification

Chava Srinivasa Sai; R Vinay Kumar; Jigeesha Sai Surapaneni; Chava Shanmukha Sai

ByteBreaker@DravidianLangTech 2026: XLM-RoBERTa Large with Sliding-Window Chunking and Top-K Mean Pooling for Writing Style Classification

Chava Srinivasa Sai, R Vinay Kumar, Jigeesha Sai Surapaneni, Chava Shanmukha Sai

Abstract

Identifying different writing styles in large chunks of text is difficult because writing styles vary in different sections of a document. Additionally, the writing styles associated with a text can be differentiated in only tiny and nuanced ways. In this paper, we describe ByteBreaker, the system we built for the Prompt Recovery for LLM Shared Task at DravidianLangTech@ACL-2026. The goal is to analyze the writing style in a specific document that a large language model (LLM) has written. The styles to choose from are categorized as: Authoritative,Formal, Humorous,Informal,Inspiring,Optimistic,Persuasive, Pessimistic, and Serious. Given that a number of documents exceed the 512 token limit of transformer models, we adopt a sliding-window method that breaks each document down into overlapping 512 token chunks, with a stride of 256 tokens. We fine-tune XLM-RoBERTa Large with just the rewritten “CHANGE STYLE” text, as that one has more distinct stylistic indicators. For prediction, we Top-K mean pool the chunk-level predictions, which puts more emphasis on the confident chunks as opposed to treating all chunks the same. To enhance consistency, we trained the model with five distinct random seeds and made three submission:a weighted ensemble(Run 1), a mean-guided single model (Run 2), and a Top-K-guided single model (Run 3). Among the three, Run 3 reached the highest macro F1 score of 0.3306, while Run 1 achieves the best accuraccy(0.3256) with a macro F1 of 0.3290.

Anthology ID:: 2026.dravidianlangtech-1.19
Volume:: Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages
Month:: July
Year:: 2026
Address:: Underline (Virtual)
Editors:: Bharathi Raja Chakravarthi, Ruba Priyadharshini, Anand Kumar Madasamy, Sajeetha Thavareesan, Saranya Rajiakodi, Subalalitha Navaneethakrishnan, Dhivya Chinnappa, Balasubramanian Palani, Malliga Subramanian, Kogilavani Shanmugavadivel, Ratnavel Rajalakshmi
Venues:: DravidianLangTech | WS
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 158–162
Language:
URL:: https://preview.aclanthology.org/ingest-acl-workshops/2026.dravidianlangtech-1.19/
DOI:
Bibkey:
Cite (ACL):: Chava Srinivasa Sai, R Vinay Kumar, Jigeesha Sai Surapaneni, and Chava Shanmukha Sai. 2026. ByteBreaker@DravidianLangTech 2026: XLM-RoBERTa Large with Sliding-Window Chunking and Top-K Mean Pooling for Writing Style Classification. In Proceedings of the Sixth Workshop on Speech, Vision, and Language Technologies for Dravidian Languages, pages 158–162, Underline (Virtual). Association for Computational Linguistics.
Cite (Informal):: ByteBreaker@DravidianLangTech 2026: XLM-RoBERTa Large with Sliding-Window Chunking and Top-K Mean Pooling for Writing Style Classification (Sai et al., DravidianLangTech 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl-workshops/2026.dravidianlangtech-1.19.pdf

PDF Cite Search Fix data