Multi-source synthetic treebank creation for improved cross-lingual dependency parsing
Francis Tyers, Mariya Sheyanova, Aleksandra Martynova, Pavel Stepachev, Konstantin Vinogorodskiy
Abstract
This paper describes a method of creating synthetic treebanks for cross-lingual dependency parsing using a combination of machine translation (including pivot translation), annotation projection and the spanning tree algorithm. Sentences are first automatically translated from a lesser-resourced language to a number of related highly-resourced languages, parsed and then the annotations are projected back to the lesser-resourced language, leading to multiple trees for each sentence from the lesser-resourced language. The final treebank is created by merging the possible trees into a graph and running the spanning tree algorithm to vote for the best tree for each sentence. We present experiments aimed at parsing Faroese using a combination of Danish, Swedish and Norwegian. In a similar experimental setup to the CoNLL 2018 shared task on dependency parsing we report state-of-the-art results on dependency parsing for Faroese using an off-the-shelf parser.- Anthology ID:
 - W18-6017
 - Volume:
 - Proceedings of the Second Workshop on Universal Dependencies (UDW 2018)
 - Month:
 - November
 - Year:
 - 2018
 - Address:
 - Brussels, Belgium
 - Venue:
 - UDW
 - SIG:
 - Publisher:
 - Association for Computational Linguistics
 - Note:
 - Pages:
 - 144–150
 - Language:
 - URL:
 - https://aclanthology.org/W18-6017
 - DOI:
 - 10.18653/v1/W18-6017
 - Cite (ACL):
 - Francis Tyers, Mariya Sheyanova, Aleksandra Martynova, Pavel Stepachev, and Konstantin Vinogorodskiy. 2018. Multi-source synthetic treebank creation for improved cross-lingual dependency parsing. In Proceedings of the Second Workshop on Universal Dependencies (UDW 2018), pages 144–150, Brussels, Belgium. Association for Computational Linguistics.
 - Cite (Informal):
 - Multi-source synthetic treebank creation for improved cross-lingual dependency parsing (Tyers et al., UDW 2018)
 - PDF:
 - https://preview.aclanthology.org/remove-xml-comments/W18-6017.pdf
 - Code
 - ftyers/cross-lingual-parsing + additional community code
 - Data
 - Universal Dependencies