Hanchen Xia


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2025

pdf bib
Chumor 2.0: Towards Better Benchmarking Chinese Humor Understanding from (Ruo Zhi Ba)
Ruiqi He | Yushu He | Longju Bai | Jiarui Liu | Zhenjie Sun | Zenghao Tang | He Wang | Hanchen Xia | Rada Mihalcea | Naihao Deng
Findings of the Association for Computational Linguistics: ACL 2025

Existing humor datasets and evaluations predominantly focus on English, leaving limited resources for culturally nuanced humor in non-English languages like Chinese. To address this gap, we construct **Chumor**, the first and the largest Chinese humor explanation dataset. **Chumor** is sourced from Ruo Zhi Ba (RZB, 弱智吧), a Chinese Reddit-like platform known for sharing intellectually challenging and culturally specific jokes. We test ten LLMs through direct and chain-of-thought prompting, revealing that **Chumor** poses significant challenges to existing LLMs, with their accuracy slightly above random and far below human. In addition, our analysis highlights that human-annotated humor explanations are significantly better than those generated by GPT-4o and ERNIE4-turbo. We release **Chumor** at https://huggingface.co/datasets/MichiganNLP/Chumor , our project page is at https://github.com/MichiganNLP/Chumor-2.0 , our leaderboard is at https://huggingface.co/spaces/MichiganNLP/Chumor-leaderboard , and our codebase is at https://github.com/MichiganNLP/Chumor-2.0 .

pdf bib
R3: “This is My SQL, Are You With Me?” A Consensus-Based Multi-Agent System for Text-to-SQL Tasks
Hanchen Xia | Feng Jiang | Naihao Deng | Cunxiang Wang | Guojiang Zhao | Rada Mihalcea | Yue Zhang
Proceedings of the 4th Table Representation Learning Workshop

Large Language Models (LLMs) have demon- strated exceptional performance across diverse tasks. To harness their capabilities for Text- to-SQL, we introduce R3 (Review-Rebuttal- Revision), a consensus-based multi-agent sys- tem for Text-to-SQL tasks. R3 achieves the new state-of-the-art performance of 89.9 on the Spider test set. In the meantime, R3 achieves 61.80 on the Bird development set. R3 out- performs existing single-LLM and multi-agent Text-to-SQL systems by 1.3% to 8.1% on Spi- der and Bird, respectively. Surprisingly, we find that for Llama-3-8B, R3 outperforms chain-of- thought prompting by over 20%, even outper- forming GPT-3.5 on the Spider development set. We open-source our codebase at https: //github.com/1ring2rta/R3.