Multi-Source Cross-Lingual Constituency Parsing
Hour Kaing, Chenchen Ding, Katsuhito Sudoh, Masao Utiyama, Eiichiro Sumita, Satoshi Nakamura
Abstract
Pretrained multilingual language models have become a key part of cross-lingual transfer for many natural language processing tasks, even those without bilingual information. This work further investigates the cross-lingual transfer ability of these models for constituency parsing and focuses on multi-source transfer. Addressing structure and label set diversity problems, we propose the integration of typological features into the parsing model and treebank normalization. We trained the model on eight languages with diverse structures and use transfer parsing for an additional six low-resource languages. The experimental results show that the treebank normalization is essential for cross-lingual transfer performance and the typological features introduce further improvement. As a result, our approach improves the baseline F1 of multi-source transfer by 5 on average.- Anthology ID:
- 2021.icon-main.41
- Volume:
- Proceedings of the 18th International Conference on Natural Language Processing (ICON)
- Month:
- December
- Year:
- 2021
- Address:
- National Institute of Technology Silchar, Silchar, India
- Editors:
- Sivaji Bandyopadhyay, Sobha Lalitha Devi, Pushpak Bhattacharyya
- Venue:
- ICON
- SIG:
- Publisher:
- NLP Association of India (NLPAI)
- Note:
- Pages:
- 341–346
- Language:
- URL:
- https://aclanthology.org/2021.icon-main.41
- DOI:
- Cite (ACL):
- Hour Kaing, Chenchen Ding, Katsuhito Sudoh, Masao Utiyama, Eiichiro Sumita, and Satoshi Nakamura. 2021. Multi-Source Cross-Lingual Constituency Parsing. In Proceedings of the 18th International Conference on Natural Language Processing (ICON), pages 341–346, National Institute of Technology Silchar, Silchar, India. NLP Association of India (NLPAI).
- Cite (Informal):
- Multi-Source Cross-Lingual Constituency Parsing (Kaing et al., ICON 2021)
- PDF:
- https://preview.aclanthology.org/proper-vol2-ingestion/2021.icon-main.41.pdf
- Data
- Penn Treebank