Universal Dependencies for Japanese

Takaaki Tanaka, Yusuke Miyao, Masayuki Asahara, Sumire Uematsu, Hiroshi Kanayama, Shinsuke Mori, Yuji Matsumoto


Abstract
We present an attempt to port the international syntactic annotation scheme, Universal Dependencies, to the Japanese language in this paper. Since the Japanese syntactic structure is usually annotated on the basis of unique chunk-based dependencies, we first introduce word-based dependencies by using a word unit called the Short Unit Word, which usually corresponds to an entry in the lexicon UniDic. Porting is done by mapping the part-of-speech tagset in UniDic to the universal part-of-speech tagset, and converting a constituent-based treebank to a typed dependency tree. The conversion is not straightforward, and we discuss the problems that arose in the conversion and the current solutions. A treebank consisting of 10,000 sentences was built by converting the existent resources and currently released to the public.
Anthology ID:
L16-1261
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1651–1658
Language:
URL:
https://aclanthology.org/L16-1261
DOI:
Bibkey:
Cite (ACL):
Takaaki Tanaka, Yusuke Miyao, Masayuki Asahara, Sumire Uematsu, Hiroshi Kanayama, Shinsuke Mori, and Yuji Matsumoto. 2016. Universal Dependencies for Japanese. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 1651–1658, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
Universal Dependencies for Japanese (Tanaka et al., LREC 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/ingestion-script-update/L16-1261.pdf