Jie Cao
Other people with similar names: Jie Cao
Unverified author pages with similar names: Jie Cao
2026
Translation via Annotation: A Computational Study of Translating Classical Chinese into Japanese
Zilong Li | Jie Cao
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Zilong Li | Jie Cao
Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Ancient people translated classical Chinese into Japanese using a system of annotations placed around characters. We abstract this process as sequence tagging tasks and fit them into modern language technologies. The research on this annotation and translation system faces a low resource problem. We alleviate this problem by introducing an LLM-based annotation pipeline and constructing a new dataset from digitized open-source translation data. We show that in the low-resource setting, introducing auxiliary Chinese NLP tasks enhances the training of sequence tagging tasks. We also evaluate the performance of Large Language Models (LLMs) on this task. While they achieve high scores on direct machine translation, our method could serve as a supplement to LLMs to improve the quality of character’s annotation.
2025
Do LLMs Encode Frame Semantics? Evidence from Frame Identification
Jayanth Krishna Chundru | Rudrashis Poddar | Jie Cao | Tianyu Jiang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Jayanth Krishna Chundru | Rudrashis Poddar | Jie Cao | Tianyu Jiang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
We investigate whether large language models encode latent knowledge of frame semantics, focusing on frame identification, a core challenge in frame semantic parsing that involves selecting the appropriate semantic frame for a target word in context. Using the FrameNet lexical resource, we evaluate models under prompt-based inference and observe that they can perform frame identification effectively even without explicit supervision. To assess the impact of task-specific training, we fine-tune the model on FrameNet data, which substantially improves in-domain accuracy while generalizing well to out-of-domain benchmarks. Further analysis shows that the models can generate semantically coherent frame definitions, highlighting the model’s internalized understanding of frame semantics.