Sen Hu
2026
CloneMem: Benchmarking Long-Term Memory for AI Clones
Sen Hu | Zhiyu Zhang | Yuxiang Wei | Xueran Han | Zhenheng Tang | Ronghao Chen | Huacan Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Sen Hu | Zhiyu Zhang | Yuxiang Wei | Xueran Han | Zhenheng Tang | Ronghao Chen | Huacan Wang
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
AI Clones aim to simulate an individual’s thoughts and behaviors to enable long-term, personalized interaction, placing stringent demands on memory systems to model experiences, emotions, and opinions over time. Existing memory benchmarks primarily rely on user–agent conversational histories, which are temporally fragmented and insufficient for capturing continuous life trajectories. We introduce CloneMem, a benchmark for evaluating long-term memory in AI Clone scenarios grounded in non-conversational digital traces, including diaries, social media posts, and emails, spanning one to three years. CloneMem adopts a top-down data construction framework to ensure longitudinal coherence and defines tasks that assess an agent’s ability to track evolving personal states. Experiments show that current memory mechanisms struggle in this setting, highlighting open challenges for life-grounded personalized AI. Code and dataset are available at https://github.com/AvatarMemory/CloneMemBench
KnowMe-Bench: Benchmarking Person Understanding for Lifelong Digital Companions
Tingyu Wu | Zhisheng Chen | Ziyan Weng | Shuhe Wang | Shuo Zhang | Sen Hu | Silin Wu | Qizhen Lan | Huacan Wang | Ronghao Chen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Tingyu Wu | Zhisheng Chen | Ziyan Weng | Shuhe Wang | Shuo Zhang | Sen Hu | Silin Wu | Qizhen Lan | Huacan Wang | Ronghao Chen
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Existing long-horizon memory benchmarks mostly use multi-turn dialogues or synthetic user histories, which makes retrieval performance an imperfect proxy for person understanding. We present Knowme-Bench, a publicly releasable benchmark built from long-form autobiographical narratives, where actions, context, and inner thoughts provide dense evidence for inferring stable motivations and decision principles. Knowme-Bench reconstructs each narrative into a flashback-aware, time-anchored stream and evaluates models with evidence-linked questions spanning factual recall, subjective state attribution, and principle-level reasoning. Across diverse narrative sources, retrieval-augmented systems mainly improve factual accuracy, while errors persist on temporally grounded explanations and higher-level inferences, highlighting the need for memory mechanisms beyond retrieval.
Does Memory Need Graphs? A Unified Framework and Empirical Analysis for Long-Term Dialog Memory
Sen Hu | Yuxiang Wei | Jiaxin Ran | Xueran Han | Zhiyuan Yao | Huacan Wang | Ronghao Chen | Lei Zou
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Sen Hu | Yuxiang Wei | Jiaxin Ran | Xueran Han | Zhiyuan Yao | Huacan Wang | Ronghao Chen | Lei Zou
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
2024
Are LLM-based Evaluators Confusing NLG Quality Criteria?
Xinyu Hu | Mingqi Gao | Sen Hu | Yang Zhang | Yicheng Chen | Teng Xu | Xiaojun Wan
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Xinyu Hu | Mingqi Gao | Sen Hu | Yang Zhang | Yicheng Chen | Teng Xu | Xiaojun Wan
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Some prior work has shown that LLMs perform well in NLG evaluation for different tasks. However, we discover that LLMs seem to confuse different evaluation criteria, which reduces their reliability. For further verification, we first consider avoiding issues of inconsistent conceptualization and vague expression in existing NLG quality criteria themselves. So we summarize a clear hierarchical classification system for 11 common aspects with corresponding different criteria from previous studies involved. Inspired by behavioral testing, we elaborately design 18 types of aspect-targeted perturbation attacks for fine-grained analysis of the evaluation behaviors of different LLMs. We also conduct human annotations beyond the guidance of the classification system to validate the impact of the perturbations. Our experimental results reveal confusion issues inherent in LLMs, as well as other noteworthy phenomena, and necessitate further research and improvements for LLM-based evaluation.
2023
AdapterDistillation: Non-Destructive Task Composition with Knowledge Distillation
Junjie Wang | Yicheng Chen | Wangshu Zhang | Sen Hu | Teng Xu | Jing Zheng
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track
Junjie Wang | Yicheng Chen | Wangshu Zhang | Sen Hu | Teng Xu | Jing Zheng
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: Industry Track
Leveraging knowledge from multiple tasks through introducing a small number of task specific parameters into each transformer layer, also known as adapters, receives much attention recently. However, adding an extra fusion layer to implement knowledge composition not only increases the inference time but also is non-scalable for some applications. To avoid these issues, we propose a two-stage knowledge distillation algorithm called AdapterDistillation. In the first stage, we extract task specific knowledge by using local data to train a student adapter. In the second stage, we distill the knowledge from the existing teacher adapters into the student adapter to help its inference. Extensive experiments on frequently asked question retrieval in task-oriented dialog systems validate the efficiency of AdapterDistillation. We show that AdapterDistillation outperforms existing algorithms in terms of accuracy, resource consumption and inference time.
Improving Knowledge Production Efficiency With Question Answering on Conversation
Changlin Yang | Siye Liu | Sen Hu | Wangshu Zhang | Teng Xu | Jing Zheng
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track)
Changlin Yang | Siye Liu | Sen Hu | Wangshu Zhang | Teng Xu | Jing Zheng
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track)
Through an online customer service application, we have collected many conversations between customer service agents and customers. Building a knowledge production system can help reduce the labor cost of maintaining the FAQ database for the customer service chatbot, whose core module is question answering (QA) on these conversations. However, most existing researches focus on document-based QA tasks, and there is a lack of researches on conversation-based QA and related datasets, especially in Chinese language. The challenges of conversation-based QA include: 1) answers may be scattered among multiple dialogue turns; 2) understanding complex dialogue contexts is more complicated than documents. To address these challenges, we propose a multi-span extraction model on this task and introduce continual pre-training and multi-task learning schemes to further improve model performance. To validate our approach, we construct two Chinese datasets using dialogues as the knowledge source, namely cs-qaconv and kd-qaconv, respectively. Experimental results demonstrate that the proposed model outperforms the baseline on both datasets. The online application also verifies the effectiveness of our method. The dataset kd-qaconv will be released publicly for research purposes.
2021
NAMER: A Node-Based Multitasking Framework for Multi-Hop Knowledge Base Question Answering
Minhao Zhang | Ruoyu Zhang | Lei Zou | Yinnian Lin | Sen Hu
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations
Minhao Zhang | Ruoyu Zhang | Lei Zou | Yinnian Lin | Sen Hu
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Demonstrations
We present NAMER, an open-domain Chinese knowledge base question answering system based on a novel node-based framework that better grasps the structural mapping between questions and KB queries by aligning the nodes in a query with their corresponding mentions in question. Equipped with techniques including data augmentation and multitasking, we show that the proposed framework outperforms the previous SoTA on CCKS CKBQA dataset. Moreover, we develop a novel data annotation strategy that facilitates the node-to-mention alignment, a dataset (https://github.com/ridiculouz/CKBQA) with such strategy is also published to promote further research. An online demo of NAMER (http://kbqademo.gstore.cn) is provided to visualize our framework and supply extra information for users, a video illustration (https://youtu.be/yetnVye_hg4) of NAMER is also available.
2018
A State-transition Framework to Answer Complex Questions over Knowledge Base
Sen Hu | Lei Zou | Xinbo Zhang
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Sen Hu | Lei Zou | Xinbo Zhang
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
Although natural language question answering over knowledge graphs have been studied in the literature, existing methods have some limitations in answering complex questions. To address that, in this paper, we propose a State Transition-based approach to translate a complex natural language question N to a semantic query graph (SQG), which is used to match the underlying knowledge graph to find the answers to question N. In order to generate SQG, we propose four primitive operations (expand, fold, connect and merge) and a learning-based state transition approach. Extensive experiments on several benchmarks (such as QALD, WebQuestions and ComplexQuestions) with two knowledge bases (DBpedia and Freebase) confirm the superiority of our approach compared with state-of-the-arts.
Search
Fix author
Co-authors
- Ronghao Chen 3
- Huacan Wang 3
- Teng Xu 3
- Lei Zou 3
- Yicheng Chen 2
- Xueran Han 2
- Yuxiang Wei 2
- Wangshu Zhang 2
- Jing Zheng 2
- Zhisheng Chen 1
- Mingqi Gao 1
- Xinyu Hu 1
- Qizhen Lan 1
- Yinnian Lin 1
- Siye Liu 1
- Jiaxin Ran 1
- Zhenheng Tang 1
- Xiaojun Wan 1
- Junjie Wang 1
- Shuhe Wang 1
- Ziyan Weng 1
- Silin Wu 1
- Tingyu Wu 1
- Changlin Yang 1
- Zhiyuan Yao 1
- Minhao Zhang 1
- Ruoyu Zhang 1
- Shuo Zhang 1
- Xinbo Zhang 1
- Yang Zhang 1
- Zhiyu Zhang 1