A Dependency Treebank of the Chinese Buddhist Canon

Tak-sum Wong, John Lee


Abstract
We present a dependency treebank of the Chinese Buddhist Canon, which contains 1,514 texts with about 50 million Chinese characters. The treebank was created by an automatic parser trained on a smaller treebank, containing four manually annotated sutras (Lee and Kong, 2014). We report results on word segmentation, part-of-speech tagging and dependency parsing, and discuss challenges posed by the processing of medieval Chinese. In a case study, we exploit the treebank to examine verbs frequently associated with Buddha, and to analyze usage patterns of quotative verbs in direct speech. Our results suggest that certain quotative verbs imply status differences between the speaker and the listener.
Anthology ID:
L16-1265
Volume:
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Month:
May
Year:
2016
Address:
Portorož, Slovenia
Venue:
LREC
SIG:
Publisher:
European Language Resources Association (ELRA)
Note:
Pages:
1679–1683
Language:
URL:
https://aclanthology.org/L16-1265
DOI:
Bibkey:
Cite (ACL):
Tak-sum Wong and John Lee. 2016. A Dependency Treebank of the Chinese Buddhist Canon. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16), pages 1679–1683, Portorož, Slovenia. European Language Resources Association (ELRA).
Cite (Informal):
A Dependency Treebank of the Chinese Buddhist Canon (Wong & Lee, LREC 2016)
Copy Citation:
PDF:
https://preview.aclanthology.org/update-css-js/L16-1265.pdf