Yuanyuan Sun


Two Languages Are Better than One: Bilingual Enhancement for Chinese Named Entity Recognition
Jinzhong Ning | Zhihao Yang | Zhizheng Wang | Yuanyuan Sun | Hongfei Lin | Jian Wang
Proceedings of the 29th International Conference on Computational Linguistics

Chinese Named Entity Recognition (NER) has continued to attract research attention. However, most existing studies only explore the internal features of the Chinese language but neglect other lingual modal features. Actually, as another modal knowledge of the Chinese language, English contains rich prompts about entities that can potentially be applied to improve the performance of Chinese NER. Therefore, in this study, we explore the bilingual enhancement for Chinese NER and propose a unified bilingual interaction module called the Adapted Cross-Transformers with Global Sparse Attention (ACT-S) to capture the interaction of bilingual information. We utilize a model built upon several different ACT-Ss to integrate the rich English information into the Chinese representation. Moreover, our model can learn the interaction of information between bilinguals (inter-features) and the dependency information within Chinese (intra-features). Compared with existing Chinese NER methods, our proposed model can better handle entities with complex structures. The English text that enhances the model is automatically generated by machine translation, avoiding high labour costs. Experimental results on four well-known benchmark datasets demonstrate the effectiveness and robustness of our proposed model.


面向法律文本的实体关系联合抽取算法(Joint Entity and Relation Extraction for Legal Texts)
Wenhui Song (宋文辉) | Xiang Zhou (周翔) | Ping Yang (杨萍) | Yuanyuan Sun (孙媛媛) | Liang Yang (杨亮) | Hongfei Lin (林鸿飞)
Proceedings of the 20th Chinese National Conference on Computational Linguistics



Joint Entity and Relation Extraction for Legal Documents with Legal Feature Enhancement
Yanguang Chen | Yuanyuan Sun | Zhihao Yang | Hongfei Lin
Proceedings of the 28th International Conference on Computational Linguistics

In recent years, the plentiful information contained in Chinese legal documents has attracted a great deal of attention because of the large-scale release of the judgment documents on China Judgments Online. It is in great need of enabling machines to understand the semantic information stored in the documents which are transcribed in the form of natural language. The technique of information extraction provides a way of mining the valuable information implied in the unstructured judgment documents. We propose a Legal Triplet Extraction System for drug-related criminal judgment documents. The system extracts the entities and the semantic relations jointly and benefits from the proposed legal lexicon feature and multi-task learning framework. Furthermore, we manually annotate a dataset for Named Entity Recognition and Relation Extraction in Chinese legal domain, which contributes to training supervised triplet extraction models and evaluating the model performance. Our experimental results show that the legal feature introduction and multi-task learning framework are feasible and effective for the Legal Triplet Extraction System. The F1 score of triplet extraction finally reaches 0.836 on the legal dataset.

基于预训练语言模型的案件要素识别方法(A Method for Case Factor Recognition Based on Pre-trained Language Models)
Haishun Liu (刘海顺) | Lei Wang (王雷) | Yanguang Chen (陈彦光) | Shuchen Zhang (张书晨) | Yuanyuan Sun (孙媛媛) | Hongfei Lin (林鸿飞)
Proceedings of the 19th Chinese National Conference on Computational Linguistics