Abstract
Statistical and neural-network-based methods that compute their results by comparing a given text to be analyzed with a reference corpus assume that the reference corpus is complete and reliable enough. In this article, I conduct several experiments on an extract of the Open American National Corpus to verify this assumption.- Anthology ID:
- W18-3802
- Volume:
- Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing
- Month:
- August
- Year:
- 2018
- Address:
- Santa Fe, New Mexico, USA
- Editors:
- Peter Machonis, Anabela Barreiro, Kristina Kocijan, Max Silberztein
- Venue:
- LR4NLP
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 2–11
- Language:
- URL:
- https://preview.aclanthology.org/more-markup/W18-3802/
- DOI:
- Cite (ACL):
- Max Silberztein. 2018. Using Linguistic Resources to Evaluate the Quality of Annotated Corpora. In Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing, pages 2–11, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
- Cite (Informal):
- Using Linguistic Resources to Evaluate the Quality of Annotated Corpora (Silberztein, LR4NLP 2018)
- PDF:
- https://preview.aclanthology.org/more-markup/W18-3802.pdf