2025
pdf
bib
abs
Large Language Models and Causal Inference in Collaboration: A Comprehensive Survey
Xiaoyu Liu
|
Paiheng Xu
|
Junda Wu
|
Jiaxin Yuan
|
Yifan Yang
|
Yuhang Zhou
|
Fuxiao Liu
|
Tianrui Guan
|
Haoliang Wang
|
Tong Yu
|
Julian McAuley
|
Wei Ai
|
Furong Huang
Findings of the Association for Computational Linguistics: NAACL 2025
Causal inference has demonstrated significant potential to enhance Natural Language Processing (NLP) models in areas such as predictive accuracy, fairness, robustness, and explainability by capturing causal relationships among variables. The rise of generative Large Language Models (LLMs) has greatly impacted various language processing tasks. This survey focuses on research that evaluates or improves LLMs from a causal view in the following areas: reasoning capacity, fairness and safety issues, explainability, and handling multimodality. Meanwhile, LLMs can assist in causal inference tasks, such as causal relationship discovery and causal effect estimation, by leveraging their generation ability and knowledge learned during pre-training. This review explores the interplay between causal inference frameworks and LLMs from both perspectives, emphasizing their collective potential to further the development of more advanced and robust artificial intelligence systems.
2022
pdf
bib
abs
COMPILING: A Benchmark Dataset for Chinese Complexity Controllable Definition Generation
Jiaxin Yuan
|
Cunliang Kong
|
Chenhui Xie
|
Liner Yang
|
Erhong Yang
Proceedings of the 21st Chinese National Conference on Computational Linguistics
“The definition generation task aims to generate a word’s definition within a specific context automatically. However, owing to the lack of datasets for different complexities, the definitions produced by models tend to keep the same complexity level. This paper proposes a novel task of generating definitions for a word with controllable complexity levels. Correspondingly, we introduce COMPILING, a dataset given detailed information about Chinese definitions, and each definition is labeled with its complexity levels. The COMPILING dataset includes 74,303 words and 106,882 definitions. To the best of our knowledge, it is the largest dataset of the Chinese definition generation task. We select various representative generation methods as baselines for this task and conduct evaluations, which illustrates that our dataset plays an outstanding role in assisting models in generating different complexity-level definitions. We believe that the COMPILING dataset will benefit further research in complexity controllable definition generation.”
2020
pdf
bib
abs
汉语学习者依存句法树库构建(Construction of a Treebank of Learner Chinese)
Jialu Shi (师佳璐)
|
Xinyu Luo (罗昕宇)
|
Liner Yang (杨麟儿)
|
Dan Xiao (肖丹)
|
Zhengsheng Hu (胡正声)
|
Yijun Wang (王一君)
|
Jiaxin Yuan (袁佳欣)
|
Yu Jingsi (余婧思)
|
Erhong Yang (杨尔弘)
Proceedings of the 19th Chinese National Conference on Computational Linguistics
汉语学习者依存句法树库为非母语者语料提供依存句法分析,可以支持第二语言教学与研究,也对面向第二语言的句法分析、语法改错等相关研究具有重要意义。然而,现有的汉语学习者依存句法树库数量较少,且在标注方面仍存在一些问题。为此,本文改进依存句法标注规范,搭建在线标注平台,并开展汉语学习者依存句法标注。本文重点介绍了数据选取、标注流程等问题,并对标注结果进行质量分析,探索二语偏误对标注质量与句法分析的影响。