Generating Scientific Definitions with Controllable Complexity

Tal August, Katharina Reinecke, Noah A. Smith


Abstract
Unfamiliar terminology and complex language can present barriers to understanding science. Natural language processing stands to help address these issues by automatically defining unfamiliar terms. We introduce a new task and dataset for defining scientific terms and controlling the complexity of generated definitions as a way of adapting to a specific reader’s background knowledge. We test four definition generation methods for this new task, finding that a sequence-to-sequence approach is most successful. We then explore the version of the task in which definitions are generated at a target complexity level. We introduce a novel reranking approach and find in human evaluations that it offers superior fluency while also controlling complexity, compared to several controllable generation baselines.
Anthology ID:
2022.acl-long.569
Volume:
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Month:
May
Year:
2022
Address:
Dublin, Ireland
Editors:
Smaranda Muresan, Preslav Nakov, Aline Villavicencio
Venue:
ACL
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
8298–8317
Language:
URL:
https://aclanthology.org/2022.acl-long.569
DOI:
10.18653/v1/2022.acl-long.569
Bibkey:
Cite (ACL):
Tal August, Katharina Reinecke, and Noah A. Smith. 2022. Generating Scientific Definitions with Controllable Complexity. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8298–8317, Dublin, Ireland. Association for Computational Linguistics.
Cite (Informal):
Generating Scientific Definitions with Controllable Complexity (August et al., ACL 2022)
Copy Citation:
PDF:
https://preview.aclanthology.org/naacl24-info/2022.acl-long.569.pdf
Code
 talaugust/definition-complexity