An Nguyen

2026

UIT-AMMC at SemEval-2026 Task 13: Exploiting Structural Formatting Signatures for Robust AI-Generated Code Detection
Cuong Pham | Minh Nguyen | Minh Le | An Nguyen | Chinh Nguyen
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

We participated in Subtask A with our Structure-Aware Contrastive Cascade, a multi-stage architecture designed to distinguish between human-authored and machine-generated code by integrating generative reasoning with explicit structural linguistic features. Our system focuses on exploiting structural formatting signatures that frequently emerge in AI-generated code as a byproduct of post-training alignment and readability optimization. The pipeline utilizes a Qwen-2.5-Coder 14B model fine-tuned via QLoRA, incorporating stochastic data augmentation techniques to ensure robustness across unseen programming languages. Final classification is achieved through a late-fusion mechanism that combines contrastive probability scores with statistical metrics of code presentation density. For samples exhibiting high epistemic uncertainty, we implement a multi-agent adversarial debate step to refine the final verdict. This approach enabled our system to achieve a Macro F1 score of 0.802, ranking 3rd on the official leaderboard.

2021

pdf bib

Learning the surface structure of wh-questions in English and French with a non-parametric Bayesian model
An Nguyen | Colin Wilson
Proceedings of the Society for Computation in Linguistics 2021

2017

pdf bib abs

ICE: Idiom and Collocation Extractor for Research and Education
Vasanthi Vuppuluri | Shahryar Baki | An Nguyen | Rakesh Verma
Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics

Collocation and idiom extraction are well-known challenges with many potential applications in Natural Language Processing (NLP). Our experimental, open-source software system, called ICE, is a python package for flexibly extracting collocations and idioms, currently in English. It also has a competitive POS tagger that can be used alone or as part of collocation/idiom extraction. ICE is available free of cost for research and educational uses in two user-friendly formats. This paper gives an overview of ICE and its performance, and briefly describes the research underlying the extraction algorithms.

Co-authors

Rakesh Verma 1

Vasanthi Vuppuluri 1

Colin Wilson 1

Venues

Fix author