Abstract
This paper identifies novel characteristics necessary to successfully represent multiple streams of natural language information from speech and text simultaneously, and proposes a multi-tiered system that implements these characteristics centered around a declarative configuration. The system facilitates easy incremental extension by allowing the creation of composable workflows of loosely coupled extensions, or plugins, allowing simple intial systems to be extended to accomodate rich representations while maintaining high data integrity. Key to this is leveraging established tools and technologies. We demonstrate using a small example.- Anthology ID:
- 2022.law-1.20
- Volume:
- Proceedings of the 16th Linguistic Annotation Workshop (LAW-XVI) within LREC2022
- Month:
- June
- Year:
- 2022
- Address:
- Marseille, France
- Editors:
- Sameer Pradhan, Sandra Kuebler
- Venue:
- LAW
- SIG:
- SIGANN
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 170–181
- Language:
- URL:
- https://preview.aclanthology.org/build-pipeline-with-new-library/2022.law-1.20/
- DOI:
- Cite (ACL):
- Sameer Pradhan and Mark Liberman. 2022. GRAIL—Generalized Representation and Aggregation of Information Layers. In Proceedings of the 16th Linguistic Annotation Workshop (LAW-XVI) within LREC2022, pages 170–181, Marseille, France. European Language Resources Association.
- Cite (Informal):
- GRAIL—Generalized Representation and Aggregation of Information Layers (Pradhan & Liberman, LAW 2022)
- PDF:
- https://preview.aclanthology.org/build-pipeline-with-new-library/2022.law-1.20.pdf
- Data
- Penn Treebank