LexiClean: An annotation tool for rapid multi-task lexical normalisation
Tyler Bikaun, Tim French, Melinda Hodkiewicz, Michael Stewart, Wei Liu
Abstract
NLP systems are often challenged by difficulties arising from noisy, non-standard, and domain specific corpora. The task of lexical normalisation aims to standardise such corpora, but currently lacks suitable tools to acquire high-quality annotated data to support deep learning based approaches. In this paper, we present LexiClean, the first open-source web-based annotation tool for multi-task lexical normalisation. LexiClean’s main contribution is support for simultaneous in situ token-level modification and annotation that can be rapidly applied corpus wide. We demonstrate the usefulness of our tool through a case study on two sets of noisy corpora derived from the specialised-domain of industrial mining. We show that LexiClean allows for the rapid and efficient development of high-quality parallel corpora. A demo of our system is available at: https://youtu.be/P7_ooKrQPDU.- Anthology ID:
- 2021.emnlp-demo.25
- Volume:
- Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
- Month:
- November
- Year:
- 2021
- Address:
- Online and Punta Cana, Dominican Republic
- Editors:
- Heike Adel, Shuming Shi
- Venue:
- EMNLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 212–219
- Language:
- URL:
- https://aclanthology.org/2021.emnlp-demo.25
- DOI:
- 10.18653/v1/2021.emnlp-demo.25
- Cite (ACL):
- Tyler Bikaun, Tim French, Melinda Hodkiewicz, Michael Stewart, and Wei Liu. 2021. LexiClean: An annotation tool for rapid multi-task lexical normalisation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 212–219, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Cite (Informal):
- LexiClean: An annotation tool for rapid multi-task lexical normalisation (Bikaun et al., EMNLP 2021)
- PDF:
- https://preview.aclanthology.org/ingest-2024-clasp/2021.emnlp-demo.25.pdf
- Code
- nlp-tlp/lexiclean