Minh

Also published as: Minh Le


2026

We participated in Subtask A with our Structure-Aware Contrastive Cascade, a multi-stage architecture designed to distinguish between human-authored and machine-generated code by integrating generative reasoning with explicit structural linguistic features. Our system focuses on exploiting structural formatting signatures that frequently emerge in AI-generated code as a byproduct of post-training alignment and readability optimization. The pipeline utilizes a Qwen-2.5-Coder 14B model fine-tuned via QLoRA, incorporating stochastic data augmentation techniques to ensure robustness across unseen programming languages. Final classification is achieved through a late-fusion mechanism that combines contrastive probability scores with statistical metrics of code presentation density. For samples exhibiting high epistemic uncertainty, we implement a multi-agent adversarial debate step to refine the final verdict. This approach enabled our system to achieve a Macro F1 score of 0.802, ranking 3rd on the official leaderboard.

2018

LSTM-based language models have been shown effective in Word Sense Disambiguation (WSD). In particular, the technique proposed by Yuan et al. (2016) returned state-of-the-art performance in several benchmarks, but neither the training data nor the source code was released. This paper presents the results of a reproduction study and analysis of this technique using only openly available datasets (GigaWord, SemCor, OMSTI) and software (TensorFlow). Our study showed that similar results can be obtained with much less data than hinted at by Yuan et al. (2016). Detailed analyses shed light on the strengths and weaknesses of this method. First, adding more unannotated training data is useful, but is subject to diminishing returns. Second, the model can correctly identify both popular and unpopular meanings. Finally, the limited sense coverage in the annotated datasets is a major limitation. All code and trained models are made freely available.

2017

Error propagation is a common problem in NLP. Reinforcement learning explores erroneous states during training and can therefore be more robust when mistakes are made early in a process. In this paper, we apply reinforcement learning to greedy dependency parsing which is known to suffer from error propagation. Reinforcement learning improves accuracy of both labeled and unlabeled dependencies of the Stanford Neural Dependency Parser, a high performance greedy parser, while maintaining its efficiency. We investigate the portion of errors which are the result of error propagation and confirm that reinforcement learning reduces the occurrence of error propagation.

2015