Dynatask: A Framework for Creating Dynamic AI Benchmark Tasks
Tristan Thrush, Kushal Tirumala, Anmol Gupta, Max Bartolo, Pedro Rodriguez, Tariq Kane, William Gaviria Rojas, Peter Mattson, Adina Williams, Douwe Kiela
Abstract
We introduce Dynatask: an open source system for setting up custom NLP tasks that aims to greatly lower the technical knowledge and effort required for hosting and evaluating state-of-the-art NLP models, as well as for conducting model in the loop data collection with crowdworkers. Dynatask is integrated with Dynabench, a research platform for rethinking benchmarking in AI that facilitates human and model in the loop data collection and evaluation. To create a task, users only need to write a short task configuration file from which the relevant web interfaces and model hosting infrastructure are automatically generated. The system is available at https://dynabench.org/ and the full library can be found at https://github.com/facebookresearch/dynabench.- Anthology ID:
- 2022.acl-demo.17
- Volume:
- Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations
- Month:
- May
- Year:
- 2022
- Address:
- Dublin, Ireland
- Editors:
- Valerio Basile, Zornitsa Kozareva, Sanja Stajner
- Venue:
- ACL
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 174–181
- Language:
- URL:
- https://aclanthology.org/2022.acl-demo.17
- DOI:
- 10.18653/v1/2022.acl-demo.17
- Cite (ACL):
- Tristan Thrush, Kushal Tirumala, Anmol Gupta, Max Bartolo, Pedro Rodriguez, Tariq Kane, William Gaviria Rojas, Peter Mattson, Adina Williams, and Douwe Kiela. 2022. Dynatask: A Framework for Creating Dynamic AI Benchmark Tasks. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 174–181, Dublin, Ireland. Association for Computational Linguistics.
- Cite (Informal):
- Dynatask: A Framework for Creating Dynamic AI Benchmark Tasks (Thrush et al., ACL 2022)
- PDF:
- https://preview.aclanthology.org/naacl24-info/2022.acl-demo.17.pdf
- Code
- facebookresearch/dynabench
- Data
- ANLI, AdversarialQA, GLUE