2025
pdf
bib
abs
Multi-matrix Factorization Attention
Jingcheng Hu
|
Houyi Li
|
Yinmin Zhang
|
Zili Wang
|
Shuigeng Zhou
|
Xiangyu Zhang
|
Heung-Yeung Shum
Findings of the Association for Computational Linguistics: ACL 2025
We propose novel attention architectures, Multi-matrix Factorization Attention (MFA) and MFA-Key-Reuse (MFA-KR). Existing variants for standard Multi-Head Attention (MHA), including SOTA methods like MLA, fail to maintain as strong performance under stringent Key-Value cache (KV cache) constraints. MFA enhances model capacity by efficiently scaling up both the number and dimension of attention heads through low-rank matrix factorization in the Query-Key (QK) circuit. Extending MFA, MFA-KR further reduces memory requirements by repurposing the key cache as value through value projection re-parameterization. MFA’s design enables strong model capacity when working under tight KV cache budget, while MFA-KR is suitable for even harsher KV cache limits with minor performance trade-off. Notably, in our extensive and large-scale experiments, the proposed architecture outperforms MLA and performs comparably to MHA, while reducing KV cache usage by up to 56% and 93.7%, respectively.
2020
pdf
bib
abs
The Design and Implementation of XiaoIce, an Empathetic Social Chatbot
Li Zhou
|
Jianfeng Gao
|
Di Li
|
Heung-Yeung Shum
Computational Linguistics, Volume 46, Issue 1 - March 2020
This article describes the development of Microsoft XiaoIce, the most popular social chatbot in the world. XiaoIce is uniquely designed as an artifical intelligence companion with an emotional connection to satisfy the human need for communication, affection, and social belonging. We take into account both intelligent quotient and emotional quotient in system design, cast human–machine social chat as decision-making over Markov Decision Processes, and optimize XiaoIce for long-term user engagement, measured in expected Conversation-turns Per Session (CPS). We detail the system architecture and key components, including dialogue manager, core chat, skills, and an empathetic computing module. We show how XiaoIce dynamically recognizes human feelings and states, understands user intent, and responds to user needs throughout long conversations. Since the release in 2014, XiaoIce has communicated with over 660 million active users and succeeded in establishing long-term relationships with many of them. Analysis of large-scale online logs shows that XiaoIce has achieved an average CPS of 23, which is significantly higher than that of other chatbots and even human conversations.
2012
pdf
bib
Twitter Topic Summarization by Ranking Tweets using Social Influence and Content Quality
Yajuan Duan
|
Zhumin Chen
|
Furu Wei
|
Ming Zhou
|
Heung-Yeung Shum
Proceedings of COLING 2012
2010
pdf
bib
An Empirical Study on Learning to Rank of Tweets
Yajuan Duan
|
Long Jiang
|
Tao Qin
|
Ming Zhou
|
Heung-Yeung Shum
Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010)