Embracing Non-Traditional Linguistic Resources for Low-resource Language Name Tagging

Boliang Zhang, Di Lu, Xiaoman Pan, Ying Lin, Halidanmu Abudukelimu, Heng Ji, Kevin Knight


Abstract
Current supervised name tagging approaches are inadequate for most low-resource languages due to the lack of annotated data and actionable linguistic knowledge. All supervised learning methods (including deep neural networks (DNN)) are sensitive to noise and thus they are not quite portable without massive clean annotations. We found that the F-scores of DNN-based name taggers drop rapidly (20%-30%) when we replace clean manual annotations with noisy annotations in the training data. We propose a new solution to incorporate many non-traditional language universal resources that are readily available but rarely explored in the Natural Language Processing (NLP) community, such as the World Atlas of Linguistic Structure, CIA names, PanLex and survival guides. We acquire and encode various types of non-traditional linguistic resources into a DNN name tagger. Experiments on three low-resource languages show that feeding linguistic knowledge can make DNN significantly more robust to noise, achieving 8%-22% absolute F-score gains on name tagging without using any human annotation
Anthology ID:
I17-1037
Volume:
Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers)
Month:
November
Year:
2017
Address:
Taipei, Taiwan
Editors:
Greg Kondrak, Taro Watanabe
Venue:
IJCNLP
SIG:
Publisher:
Asian Federation of Natural Language Processing
Note:
Pages:
362–372
Language:
URL:
https://aclanthology.org/I17-1037
DOI:
Bibkey:
Cite (ACL):
Boliang Zhang, Di Lu, Xiaoman Pan, Ying Lin, Halidanmu Abudukelimu, Heng Ji, and Kevin Knight. 2017. Embracing Non-Traditional Linguistic Resources for Low-resource Language Name Tagging. In Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 362–372, Taipei, Taiwan. Asian Federation of Natural Language Processing.
Cite (Informal):
Embracing Non-Traditional Linguistic Resources for Low-resource Language Name Tagging (Zhang et al., IJCNLP 2017)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-2024-clasp/I17-1037.pdf
Data
Panlex