Unlocking the Heterogeneous Landscape of Big Data NLP with DUUI
Alexander Leonhardt, Giuseppe Abrami, Daniel Baumartz, Alexander Mehler
Abstract
Automatic analysis of large corpora is a complex task, especially in terms of time efficiency. This complexity is increased by the fact that flexible, extensible text analysis requires the continuous integration of ever new tools. Since there are no adequate frameworks for these purposes in the field of NLP, and especially in the context of UIMA, that are not outdated or unusable for security reasons, we present a new approach to address the latter task: Docker Unified UIMA Interface (DUUI), a scalable, flexible, lightweight, and feature-rich framework for automatic distributed analysis of text corpora that leverages Big Data experience and virtualization with Docker. We evaluate DUUI’s communication approach against a state-of-the-art approach and demonstrate its outstanding behavior in terms of time efficiency, enabling the analysis of big text data.- Anthology ID:
- 2023.findings-emnlp.29
- Volume:
- Findings of the Association for Computational Linguistics: EMNLP 2023
- Month:
- December
- Year:
- 2023
- Address:
- Singapore
- Editors:
- Houda Bouamor, Juan Pino, Kalika Bali
- Venue:
- Findings
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 385–399
- Language:
- URL:
- https://aclanthology.org/2023.findings-emnlp.29
- DOI:
- 10.18653/v1/2023.findings-emnlp.29
- Cite (ACL):
- Alexander Leonhardt, Giuseppe Abrami, Daniel Baumartz, and Alexander Mehler. 2023. Unlocking the Heterogeneous Landscape of Big Data NLP with DUUI. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 385–399, Singapore. Association for Computational Linguistics.
- Cite (Informal):
- Unlocking the Heterogeneous Landscape of Big Data NLP with DUUI (Leonhardt et al., Findings 2023)
- PDF:
- https://preview.aclanthology.org/emnlp-22-attachments/2023.findings-emnlp.29.pdf