Abstract
A growing swath of NLP research is tackling problems related to generating long text, including tasks such as open-ended story generation, summarization, dialogue, and more. However, we currently lack appropriate tools to evaluate these long outputs of generation models: classic automatic metrics such as ROUGE have been shown to perform poorly, and newer learned metrics do not necessarily work wellfor all tasks and domains of text. Human rating and error analysis remains a crucial component for any evaluation of long text generation. In this paper, we introduce FALTE, a web-based annotation toolkit designed to address this shortcoming. Our tool allows researchers to collect fine-grained judgments of text quality from crowdworkers using an error taxonomy specific to the downstream task. Using the taskinterface, annotators can select and assign error labels to text span selections in an incremental paragraph-level annotation workflow. The latter functionality is designed to simplify the document-level task into smaller units and reduce cognitive load on the annotators. Our tool has previously been used to run a large-scale annotation study that evaluates the coherence of long generated summaries, demonstrating its utility.- Anthology ID:
- 2022.emnlp-demos.35
- Volume:
- Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
- Month:
- December
- Year:
- 2022
- Address:
- Abu Dhabi, UAE
- Editors:
- Wanxiang Che, Ekaterina Shutova
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 351–358
- Language:
- URL:
- https://aclanthology.org/2022.emnlp-demos.35
- DOI:
- 10.18653/v1/2022.emnlp-demos.35
- Cite (ACL):
- Tanya Goyal, Junyi Jessy Li, and Greg Durrett. 2022. FALTE: A Toolkit for Fine-grained Annotation for Long Text Evaluation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 351–358, Abu Dhabi, UAE. Association for Computational Linguistics.
- Cite (Informal):
- FALTE: A Toolkit for Fine-grained Annotation for Long Text Evaluation (Goyal et al., EMNLP 2022)
- PDF:
- https://preview.aclanthology.org/dois-2013-emnlp/2022.emnlp-demos.35.pdf