Daniel Jorge Bernardo Ferreira
2025
A Framework for Fine-Grained Complexity Control in Health Answer Generation
Daniel Jorge Bernardo Ferreira
|
Tiago Almeida
|
Sérgio Matos
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 4: Student Research Workshop)
Health literacy plays a critical role in ensuring people can access, understand, and act on medical information. However, much of the health content available today is too complex for many people, and simplifying these texts manually is time-consuming and difficult to do at scale.To overcome this, we developed a new framework for automatically generating health answers at multiple, precisely controlled complexity levels.We began with a thorough analysis of 166 linguistic features, which we then refined into 13 key metrics that reliably differentiate between simple and complex medical texts. From these metrics, we derived a robust complexity scoring formula, combining them with weights learned from a logistic regression model. This formula allowed us to create a large, multi-level dataset of health question-answer pairs covering 21 distinct complexity levels, ranging from elementary patient-friendly explanations to highly technical summaries.Finally, we fine-tuned a Llama-3.1-8B-Instruct model using “control codes” on this dataset, giving users precise control over the complexity of the generated text and empowering them to select the level of detail and technicality they need.