LingGen: Scalable Multi-Attribute Linguistic Control via Power-Law Masking

Mohamed Elgaar, Hadi Amiri


Abstract
We present LingGen, a controlled text generation model that allows fine-grained control over a large number of real-valued linguistic attributes. It encodes target attribute values with a dedicated linguistic attribute encoder and conditions the language model by injecting the resulting representation into the language model using the beginning-of-sequence (BOS) embeddings. To improve robustness when controlling different attribute subsets, we introduce P-MASKING, which samples per-example attribute masking rates from a truncated Pareto distribution during training. Across 1-40 control attributes, LingGen achieves the lowest average control error among evaluated methods, while remaining efficient at inference and receiving the highest fluency scores in human evaluation. Ablations show that Pareto-sampled masking and BOS-based injection are effective choices compared to alternative masking and integration variants.
Anthology ID:
2026.eacl-long.85
Volume:
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
March
Year:
2026
Address:
Rabat, Morocco
Editors:
Vera Demberg, Kentaro Inui, Lluís Marquez
Venue:
EACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
1925–1942
Language:
URL:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-long.85/
DOI:
Bibkey:
Cite (ACL):
Mohamed Elgaar and Hadi Amiri. 2026. LingGen: Scalable Multi-Attribute Linguistic Control via Power-Law Masking. In Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1925–1942, Rabat, Morocco. Association for Computational Linguistics.
Cite (Informal):
LingGen: Scalable Multi-Attribute Linguistic Control via Power-Law Masking (Elgaar & Amiri, EACL 2026)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-eacl/2026.eacl-long.85.pdf