Hieu Trung Nguyen

2025

pdf bib abs
Structured Pruning for Diverse Best-of-N Reasoning Optimization
Hieu Trung Nguyen | Bao Nguyen | Viet Anh Nguyen
Findings of the Association for Computational Linguistics: ACL 2025

Model pruning in transformer-based language models, traditionally seen as a means of computational savings, can enhance the model’s reasoning capabilities. In this work, we uncover the surprising phenomenon that the selective pruning of certain attention heads leads to improvements in reasoning performance, particularly on challenging tasks. Motivated by this observation, we propose SPRINT, a novel contrastive learning framework that dynamically selects the optimal head and layer to prune during inference. By aligning question embeddings with head embeddings, our approach identifies those pruned-head configurations that result in more accurate reasoning. Extensive experiments on the MATH dataset demonstrate that our method significantly outperforms traditional best-of-N and random head selection strategies on the MATH500 and GSM8K datasets.

Advances in Large Language Models (LLMs) paved the way for their emerging applications in various domains, such as human behavior simulations, where LLMs could augment human-generated data in social science research and machine learning model training. However, pretrained LLMs often fail to capture the behavioral diversity of target populations due to the inherent variability across individuals and groups. To address this, we propose Mixture of Personas (MoP), a probabilistic prompting method that aligns LLM responses with the target population. MoP is a contextual mixture model, where each component is an LM agent characterized by a persona and an exemplar that represents the behaviors of subpopulation. The persona and the exemplar are randomly chosen according to the learned mixing weights to elicit diverse LLM responses during simulation. MoP is flexible, does not require model fine-tuning, and is transferable between base models. Experiments for synthetic data generation show that MoP outperforms competing methods in alignment and diversity metrics.

pdf bib abs
Task-driven Layerwise Additive Activation Intervention
Hieu Trung Nguyen | Bao Nguyen | Binh Nguyen | Viet Anh Nguyen
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)

Modern language models (LMs) have significantly advanced generative modeling in natural language processing (NLP). Despite their success, LMs often struggle with adaptation to new contexts in real-time applications. A promising approach to task adaptation is activation intervention, which steers the LMs’ generation process by identifying and manipulating the activations. However, existing interventions rely heavily on heuristic rules or require many prompt inputs to determine effective interventions. In this paper, we propose a layer-wise additive activation intervention framework that optimizes the intervention process, thereby enhancing sample efficiency. We evaluate our framework on various datasets, demonstrating improvements in the accuracy of pretrained LMs and competing intervention baselines.

Co-authors

Weikang Qiu 1

Julian Theodore 1

Rex Ying 1

Venues

findings2
naacl1

Fix author