Many Languages, One Parser

Waleed Ammar, George Mulcaire, Miguel Ballesteros, Chris Dyer, Noah A. Smith


Abstract
We train one multilingual model for dependency parsing and use it to parse sentences in several languages. The parsing model uses (i) multilingual word clusters and embeddings; (ii) token-level language information; and (iii) language-specific features (fine-grained POS tags). This input representation enables the parser not only to parse effectively in multiple languages, but also to generalize across languages based on linguistic universals and typological similarities, making it more effective to learn from limited annotations. Our parser’s performance compares favorably to strong baselines in a range of data scenarios, including when the target language has a large treebank, a small treebank, or no treebank for training.
Anthology ID:
Q16-1031
Volume:
Transactions of the Association for Computational Linguistics, Volume 4
Month:
Year:
2016
Address:
Cambridge, MA
Editors:
Lillian Lee, Mark Johnson, Kristina Toutanova
Venue:
TACL
SIG:
Publisher:
MIT Press
Note:
Pages:
431–444
Language:
URL:
https://aclanthology.org/Q16-1031
DOI:
10.1162/tacl_a_00109
Bibkey:
Cite (ACL):
Waleed Ammar, George Mulcaire, Miguel Ballesteros, Chris Dyer, and Noah A. Smith. 2016. Many Languages, One Parser. Transactions of the Association for Computational Linguistics, 4:431–444.
Cite (Informal):
Many Languages, One Parser (Ammar et al., TACL 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingest-bitext-workshop/Q16-1031.pdf
Code
 clab/language-universal-parser
Data
Universal Dependencies