Interpreting Style Representations via Style-Eliciting Prompts

Junghwan Kim; David Jurgens

Interpreting Style Representations via Style-Eliciting Prompts

Abstract

Style representation learning is a powerful tool for authorship analysis and modeling writing style, yet the latent nature of learned representations makes them difficult to interpret. Recent work has attempted to explain these representations by generating natural language descriptions with large language models (LLMs) conditioned on input text. However, such descriptions are often prone to the LLM’s biases and hallucinations, and they lack an explicit objective and practical utility. In this work, we propose a novel framework for interpreting style representations through style-eliciting prompts: natural language instructions designed to steer LLMs to generate text that reflects specific stylistic attributes. We curate 1,010 distinct style features spanning 26 stylistic categories and construct a dataset by prompting an LLM to generate text conditioned on these features. Using this data, we train a decoder to generate a style prompt from the style representation of the generated text. We evaluate our approach on three tasks: (1) recovering original style prompts from generated text, (2) generating text in the same style using the recovered prompts, and (3) steering LLM outputs to match the style of human-written texts. Experiments demonstrate that our method consistently outperforms strong baselines that directly prompt LLMs with target text, achieving superior performance in both style description and style imitation. These results highlight style-eliciting prompts can provide a practical and interpretable interface to stylistic information encoded in style representations.

Anthology ID:: 2026.findings-acl.2039
Volume:: Findings of the Association for Computational Linguistics: ACL 2026
Month:: July
Year:: 2026
Address:: San Diego, California, United States
Editors:: Maria Liakata, Viviane P. Moreira, Jiajun Zhang, David Jurgens
Venue:: Findings
SIG:
Publisher:: Association for Computational Linguistics
Note:
Pages:: 41030–41048
Language:
URL:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.2039/
DOI:
Bibkey:
Cite (ACL):: Junghwan Kim and David Jurgens. 2026. Interpreting Style Representations via Style-Eliciting Prompts. In Findings of the Association for Computational Linguistics: ACL 2026, pages 41030–41048, San Diego, California, United States. Association for Computational Linguistics.
Cite (Informal):: Interpreting Style Representations via Style-Eliciting Prompts (Kim & Jurgens, Findings 2026)
Copy Citation:
PDF:: https://preview.aclanthology.org/ingest-acl/2026.findings-acl.2039.pdf
Checklist:: 2026.findings-acl.2039.checklist.pdf

PDF Cite Search Checklist Fix data