Abstract
This paper introduces an algorithm to convert Universal Dependencies (UD) treebanks to Combinatory Categorial Grammar (CCG) treebanks. As CCG encodes almost all grammatical information into the lexicon, obtaining a high-quality CCG derivation from a dependency tree is a challenging task. Our algorithm relies on hand-crafted rules to assign categories to constituents, and a non-statistical parser to derive full CCG parses given the assigned categories. To evaluate our converted treebanks, we perform lexical, sentential, and syntactic rule coverage analysis, as well as CCG parsing experiments. Finally, we discuss how our method handles complex constructions, and propose possible future extensions.- Anthology ID:
- 2022.lrec-1.560
- Volume:
- Proceedings of the Thirteenth Language Resources and Evaluation Conference
- Month:
- June
- Year:
- 2022
- Address:
- Marseille, France
- Editors:
- Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 5220–5233
- Language:
- URL:
- https://aclanthology.org/2022.lrec-1.560
- DOI:
- Cite (ACL):
- Tu-Anh Tran and Yusuke Miyao. 2022. Development of a Multilingual CCG Treebank via Universal Dependencies Conversion. In Proceedings of the Thirteenth Language Resources and Evaluation Conference, pages 5220–5233, Marseille, France. European Language Resources Association.
- Cite (Informal):
- Development of a Multilingual CCG Treebank via Universal Dependencies Conversion (Tran & Miyao, LREC 2022)
- PDF:
- https://preview.aclanthology.org/nschneid-patch-1/2022.lrec-1.560.pdf
- Data
- Penn Treebank, Universal Dependencies