Multi-source synthetic treebank creation for improved cross-lingual dependency parsing
Francis Tyers, Mariya Sheyanova, Aleksandra Martynova, Pavel Stepachev, Konstantin Vinogorodskiy
Abstract
This paper describes a method of creating synthetic treebanks for cross-lingual dependency parsing using a combination of machine translation (including pivot translation), annotation projection and the spanning tree algorithm. Sentences are first automatically translated from a lesser-resourced language to a number of related highly-resourced languages, parsed and then the annotations are projected back to the lesser-resourced language, leading to multiple trees for each sentence from the lesser-resourced language. The final treebank is created by merging the possible trees into a graph and running the spanning tree algorithm to vote for the best tree for each sentence. We present experiments aimed at parsing Faroese using a combination of Danish, Swedish and Norwegian. In a similar experimental setup to the CoNLL 2018 shared task on dependency parsing we report state-of-the-art results on dependency parsing for Faroese using an off-the-shelf parser.- Anthology ID:
- W18-6017
- Volume:
- Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)
- Month:
- November
- Year:
- 2018
- Address:
- Brussels, Belgium
- Editors:
- Marie-Catherine de Marneffe, Teresa Lynn, Sebastian Schuster
- Venue:
- UDW
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 144–150
- Language:
- URL:
- https://preview.aclanthology.org/build-pipeline-with-new-library/W18-6017/
- DOI:
- 10.18653/v1/W18-6017
- Cite (ACL):
- Francis Tyers, Mariya Sheyanova, Aleksandra Martynova, Pavel Stepachev, and Konstantin Vinogorodskiy. 2018. Multi-source synthetic treebank creation for improved cross-lingual dependency parsing. In Proceedings of the Second Workshop on Universal Dependencies (UDW 2018), pages 144–150, Brussels, Belgium. Association for Computational Linguistics.
- Cite (Informal):
- Multi-source synthetic treebank creation for improved cross-lingual dependency parsing (Tyers et al., UDW 2018)
- PDF:
- https://preview.aclanthology.org/build-pipeline-with-new-library/W18-6017.pdf
- Code
- ftyers/cross-lingual-parsing + additional community code
- Data
- Universal Dependencies