Embracing Non-Traditional Linguistic Resources for Low-resource Language Name Tagging
Boliang Zhang, Di Lu, Xiaoman Pan, Ying Lin, Halidanmu Abudukelimu, Heng Ji, Kevin Knight
Abstract
Current supervised name tagging approaches are inadequate for most low-resource languages due to the lack of annotated data and actionable linguistic knowledge. All supervised learning methods (including deep neural networks (DNN)) are sensitive to noise and thus they are not quite portable without massive clean annotations. We found that the F-scores of DNN-based name taggers drop rapidly (20%-30%) when we replace clean manual annotations with noisy annotations in the training data. We propose a new solution to incorporate many non-traditional language universal resources that are readily available but rarely explored in the Natural Language Processing (NLP) community, such as the World Atlas of Linguistic Structure, CIA names, PanLex and survival guides. We acquire and encode various types of non-traditional linguistic resources into a DNN name tagger. Experiments on three low-resource languages show that feeding linguistic knowledge can make DNN significantly more robust to noise, achieving 8%-22% absolute F-score gains on name tagging without using any human annotation- Anthology ID:
- I17-1037
- Volume:
- Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
- Month:
- November
- Year:
- 2017
- Address:
- Taipei, Taiwan
- Editors:
- Greg Kondrak, Taro Watanabe
- Venue:
- IJCNLP
- SIG:
- Publisher:
- Asian Federation of Natural Language Processing
- Note:
- Pages:
- 362–372
- Language:
- URL:
- https://aclanthology.org/I17-1037
- DOI:
- Cite (ACL):
- Boliang Zhang, Di Lu, Xiaoman Pan, Ying Lin, Halidanmu Abudukelimu, Heng Ji, and Kevin Knight. 2017. Embracing Non-Traditional Linguistic Resources for Low-resource Language Name Tagging. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 362–372, Taipei, Taiwan. Asian Federation of Natural Language Processing.
- Cite (Informal):
- Embracing Non-Traditional Linguistic Resources for Low-resource Language Name Tagging (Zhang et al., IJCNLP 2017)
- PDF:
- https://preview.aclanthology.org/ingest-2024-clasp/I17-1037.pdf
- Data
- Panlex