Xiangfeng Meng

2026

Recent progress in Large Language Model (LLM) based Table Question Answering (TableQA) has demonstrated strong performance on standard benchmarks. However, existing benchmarks mainly focus on well-structured tables and fail to reflect the irregular structures and complex reasoning commonly encountered in real-world scenarios. We propose CompTab, a benchmark designed to evaluate TableQA under complex reasoning and irregular table conditions. CompTab covers six representative types, including semantic ambiguity, multi-hop reasoning, transposed tables, merged cells, missing values, and outliers. It is constructed from real-world seed tables across multiple domains using controlled LLM based generation and human verification to ensure realism and diversity. In addition, to improve the generalization of LLMs under complex and irregular table settings, we propose a two-stage training framework that progressively aligns models with textual reasoning and executable decision signals, instantiated as CompTabLLM. Evaluations on 38 representative LLMs and CompTabLLM show clear limitations of existing LLMs under realistic conditions, while the proposed framework improves generalization. CompTab thus provides a challenging benchmark for advancing TableQA in real-world.

2024

pdf bib abs

AISPACE at SemEval-2024 task 8: A Class-balanced Soft-voting System for Detecting Multi-generator Machine-generated Text
Renhua Gu | Xiangfeng Meng
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

SemEval-2024 Task 8 provides a challenge to detect human-written and machine-generated text. There are 3 subtasks for different detection scenarios. This paper proposes a system that mainly deals with Subtask B. It aims to detect if given full text is written by human or is generated by a specific Large Language Model (LLM), which is actually a multi-class text classification task. Our team AISPACE conducted a systematic study of fine-tuning transformer-based models, including encoder-only, decoder-only and encoder-decoder models. We compared their performance on this task and identified that encoder-only models performed exceptionally well. We also applied a weighted Cross Entropy loss function to address the issue of data imbalance of different class samples. Additionally, we employed soft-voting strategy over multi-models ensemble to enhance the reliability of our predictions. Our system ranked top 1 in Subtask B, which sets a state-of-the-art benchmark for this new challenge.

pdf bib abs

TM-TREK at SemEval-2024 Task 8: Towards LLM-Based Automatic Boundary Detection for Human-Machine Mixed Text
Xiaoyan Qu | Xiangfeng Meng
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

With the increasing prevalence of text gener- ated by large language models (LLMs), there is a growing concern about distinguishing be- tween LLM-generated and human-written texts in order to prevent the misuse of LLMs, such as the dissemination of misleading information and academic dishonesty. Previous research has primarily focused on classifying text as ei- ther entirely human-written or LLM-generated, neglecting the detection of mixed texts that con- tain both types of content. This paper explores LLMs’ ability to identify boundaries in human- written and machine-generated mixed texts. We approach this task by transforming it into a to- ken classification problem and regard the label turning point as the boundary. Notably, our ensemble model of LLMs achieved first place in the ‘Human-Machine Mixed Text Detection’ sub-task of the SemEval’24 Competition Task 8. Additionally, we investigate factors that in- fluence the capability of LLMs in detecting boundaries within mixed texts, including the incorporation of extra layers on top of LLMs, combination of segmentation loss, and the im- pact of pretraining. Our findings aim to provide valuable insights for future research in this area.

2023

pdf bib abs

This paper describes our system for SemEval-2023 Task 2 Multilingual Complex Named EntityRecognition (MultiCoNER II). Our teamSamsung Research China - Beijing proposesan AL-R (Adjustable Loss RoBERTa) model toboost the performance of recognizing short andcomplex entities with the challenges of longtaildata distribution, out of knowledge base andnoise scenarios. We first employ an adjustabledice loss optimization objective to overcomethe issue of long-tail data distribution, which isalso proved to be noise-robusted, especially incombatting the issue of fine-grained label confusing. Besides, we develop our own knowledgeenhancement tool to provide related contextsfor the short context setting and addressthe issue of out of knowledge base. Experimentshave verified the validation of our approaches.