Abstract
Languages change over time and ancient languages have been studied in linguistics and other related fields. A main challenge in this research area is the lack of empirical data; for instance, ancient spoken languages often leave little trace of their linguistic properties. From the perspective of natural language processing (NLP), while the NLP community has created dozens of annotated corpora, very few of them are on ancient languages. As an effort toward bridging the gap, we have created a word segmented and POS tagged corpus for Archaic Chinese using articles from Huainanzi, a book written during Chinas Western Han Dynasty (206 BC-9 AD). We then compare this corpus with the Chinese Penn Treebank (CTB), a well-known corpus for Modern Chinese, and report several interesting differences and similarities between the two corpora. Finally, we demonstrate that the CTB can be used to improve the performance of word segmenters and POS taggers for Archaic Chinese, but only through features that have similar behaviors in the two corpora.- Anthology ID:
- L14-1163
- Volume:
- Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
- Month:
- May
- Year:
- 2014
- Address:
- Reykjavik, Iceland
- Editors:
- Nicoletta Calzolari, Khalid Choukri, Thierry Declerck, Hrafn Loftsson, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association (ELRA)
- Note:
- Pages:
- 3129–3136
- Language:
- URL:
- http://www.lrec-conf.org/proceedings/lrec2014/pdf/138_Paper.pdf
- DOI:
- Cite (ACL):
- Yan Song and Fei Xia. 2014. Modern Chinese Helps Archaic Chinese Processing: Finding and Exploiting the Shared Properties. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), pages 3129–3136, Reykjavik, Iceland. European Language Resources Association (ELRA).
- Cite (Informal):
- Modern Chinese Helps Archaic Chinese Processing: Finding and Exploiting the Shared Properties (Song & Xia, LREC 2014)
- PDF:
- http://www.lrec-conf.org/proceedings/lrec2014/pdf/138_Paper.pdf