Abstract
Semi-structured explanation depicts the implicit process of a reasoner with an explicit representation. This explanation highlights how available information in a specific query is utilised and supplemented with information a reasoner produces from its internal weights towards generating an answer. Despite the recent improvements in generative capabilities of language models, producing structured explanations to verify a model’s true reasoning capabilities remains a challenge. This issue is particularly pronounced for not-so-large LMs (e.g., FLAN-T5-XXL). In this work, we first underscore the limitations of supervised fine-tuning (SFT) in tackling this challenge, and then introduce a carefully crafted reward engineering method in reinforcement learning (RL) to better address this problem. We investigate multiple reward aggregation methods and provide a detailed discussion which sheds light on the promising potential of RL for future research. Our proposed method on two semi-structured explanation generation benchmarks (ExplaGraph and COPA-SSE) achieves new state-of-the-art results.- Anthology ID:
- 2024.findings-eacl.41
- Volume:
- Findings of the Association for Computational Linguistics: EACL 2024
- Month:
- March
- Year:
- 2024
- Address:
- St. Julian’s, Malta
- Editors:
- Yvette Graham, Matthew Purver
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 589–602
- Language:
- URL:
- https://aclanthology.org/2024.findings-eacl.41
- DOI:
- Cite (ACL):
- Jiuzhou Han, Wray Buntine, and Ehsan Shareghi. 2024. Reward Engineering for Generating Semi-structured Explanation. In Findings of the Association for Computational Linguistics: EACL 2024, pages 589–602, St. Julian’s, Malta. Association for Computational Linguistics.
- Cite (Informal):
- Reward Engineering for Generating Semi-structured Explanation (Han et al., Findings 2024)
- PDF:
- https://preview.aclanthology.org/ingest-bitext-workshop/2024.findings-eacl.41.pdf