Xin Chen

2022

pdf bib abs
AstBERT: Enabling Language Model for Financial Code Understanding with Abstract Syntax Trees
Rong Liang | Tiehua Zhang | Yujie Lu | Yuze Liu | Zhen Huang | Xin Chen
Proceedings of the Fourth Workshop on Financial Technology and Natural Language Processing (FinNLP)

Using the pre-trained language models to understand source codes has attracted increasing attention from financial institutions owing to the great potential to uncover financial risks. However, there are several challenges in applying these language models to solve programming language related problems directly. For instance, the shift of domain knowledge between natural language (NL) and programming language (PL) requires understanding the semantic and syntactic information from the data from different perspectives. To this end, we propose the AstBERT model, a pre-trained PL model aiming to better understand the financial codes using the abstract syntax tree (AST). Specifically, we collect a sheer number of source codes (both Java and Python) from the Alipay code repository and incorporate both syntactic and semantic code knowledge into our model through the help of code parsers, in which AST information of the source codes can be interpreted and integrated. We evaluate the performance of the proposed model on three tasks, including code question answering, code clone detection and code refinement. Experiment results show that our AstBERT achieves promising performance on three different downstream tasks.

2021

pdf
Jointly Identifying Rhetoric and Implicit Emotions via Multi-Task Learning
Xin Chen | Zhen Hai | Deyu Li | Suge Wang | Dian Wang
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf abs
基于人物特征增强的拟人句要素抽取方法研究(Research on Element Extraction of Personified Sentences Based on Enhanced Characters)
Jing Li (李婧) | Suge Wang (王素格) | Xin Chen (陈鑫) | Dian Wang (王典)
Proceedings of the 20th Chinese National Conference on Computational Linguistics

在散文阅读理解的鉴赏类问题中,对拟人句赏析考查比较频繁。目前,已有的工作仅对拟人句中的本体要素进行识别并抽取,存在要素抽取不完整的问题,尤其是当句子中出现多个本体时,需要确定拟人词与各个本体的对应关系。为解决这些问题,本文提出了基于人物特征增强的拟人句要素抽取方法。该方法利用特定领域的特征,增强句子的向量表示,再利用条件随机场模型对拟人句中的本体和拟人词要素进行识别。在此基础上,利用自注意力机制对要素之间的关系进行检测,使用要素同步机制和关系同步机制进行信息交互,用于要素识别和关系检测的输入更新。在自建的拟人数据集上进行<本体,拟人词>抽取的比较实验,结果表明本文提出的模型性能优于其他比较模型。