Minh Lê

Also published as: Minh Le

2026

UIT-AMMC at SemEval-2026 Task 13: Exploiting Structural Formatting Signatures for Robust AI-Generated Code Detection
Cuong Pham | Minh Nguyen | Minh Le | An Nguyen | Chinh Nguyen
Proceedings of the 20th International Workshop on Semantic Evaluation (2026)

We participated in Subtask A with our Structure-Aware Contrastive Cascade, a multi-stage architecture designed to distinguish between human-authored and machine-generated code by integrating generative reasoning with explicit structural linguistic features. Our system focuses on exploiting structural formatting signatures that frequently emerge in AI-generated code as a byproduct of post-training alignment and readability optimization. The pipeline utilizes a Qwen-2.5-Coder 14B model fine-tuned via QLoRA, incorporating stochastic data augmentation techniques to ensure robustness across unseen programming languages. Final classification is achieved through a late-fusion mechanism that combines contrastive probability scores with statistical metrics of code presentation density. For samples exhibiting high epistemic uncertainty, we implement a multi-agent adversarial debate step to refine the final verdict. This approach enabled our system to achieve a Macro F1 score of 0.802, ranking 3rd on the official leaderboard.

2018

pdf bib

Neural Models of Selectional Preferences for Implicit Semantic Role Labeling
Minh Le | Antske Fokkens
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib abs

A Deep Dive into Word Sense Disambiguation with LSTM
Minh Le | Marten Postma | Jacopo Urbani | Piek Vossen
Proceedings of the 27th International Conference on Computational Linguistics

LSTM-based language models have been shown effective in Word Sense Disambiguation (WSD). In particular, the technique proposed by Yuan et al. (2016) returned state-of-the-art performance in several benchmarks, but neither the training data nor the source code was released. This paper presents the results of a reproduction study and analysis of this technique using only openly available datasets (GigaWord, SemCor, OMSTI) and software (TensorFlow). Our study showed that similar results can be obtained with much less data than hinted at by Yuan et al. (2016). Detailed analyses shed light on the strengths and weaknesses of this method. First, adding more unannotated training data is useful, but is subject to diminishing returns. Second, the model can correctly identify both popular and unpopular meanings. Finally, the limited sense coverage in the annotated datasets is a major limitation. All code and trained models are made freely available.

2017

pdf bib abs

Tackling Error Propagation through Reinforcement Learning: A Case of Greedy Dependency Parsing
Minh Lê | Antske Fokkens
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers

Error propagation is a common problem in NLP. Reinforcement learning explores erroneous states during training and can therefore be more robust when mistakes are made early in a process. In this paper, we apply reinforcement learning to greedy dependency parsing which is known to suffer from error propagation. Reinforcement learning improves accuracy of both labeled and unlabeled dependencies of the Stanford Neural Dependency Parser, a high performance greedy parser, while maintaining its efficiency. We investigate the portion of errors which are the result of error propagation and confirm that reinforcement learning reduces the occurrence of error propagation.

Minh Lê

2026

2018

2017

2015

Co-authors

Venues