Yan Huang

2025

pdf bib abs
Enhancing Marker Scoring Accuracy through Ordinal Confidence Modelling in Educational Assessments
Abhirup Chakravarty | Mark Brenchley | Trevor Breakspear | Ian Lewin | Yan Huang
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 6: Industry Track)

A key ethical challenge in Automated Essay Scoring (AES) is ensuring that scores are only released when they meet high reliability standards. Confidence modelling addresses this by assigning a reliability estimate measure, in the form of a confidence score, to each automated score. In this study, we frame confidence estimation as a classification task: predicting whether an AES-generated score correctly places a candidate in the appropriate CEFR level. While this is a binary decision, we leverage the inherent granularity of the scoring domain in two ways. First, we reformulate the task as an n-ary classification problem using score binning. Second, we introduce a set of novel Kernel Weighted Ordinal Categorical Cross Entropy (KWOCCE) loss functions that incorporate the ordinal structure of CEFR labels. Our best-performing model achieves an F1 score of 0.97, and enables the system to release 47% of scores with 100% CEFR agreement and 99% with at least 95% CEFR agreement — compared to ≈ 92 % CEFR agreement from the standalone AES model where we release all AM predicted scores.

pdf bib abs
DAM: Dynamic Attention Mask for Long-Context Large Language Model Inference Acceleration
Hanzhi Zhang | Heng Fan | Kewei Sha | Yan Huang | Yunhe Feng
Findings of the Association for Computational Linguistics: ACL 2025

Long-context understanding is crucial for many NLP applications, yet transformers struggle with efficiency due to the quadratic complexity of self-attention. Sparse attention methods alleviate this cost but often impose static, predefined masks, failing to capture heterogeneous attention patterns. This results in suboptimal token interactions, limiting adaptability and retrieval accuracy in long-sequence tasks. This work introduces a dynamic sparse attention mechanism that assigns adaptive masks at the attention-map level, preserving heterogeneous patterns across layers and heads. Unlike existing approaches, our method eliminates the need for fine-tuning and predefined mask structures while maintaining computational efficiency. By learning context-aware attention structures, it achieves high alignment with full-attention models, ensuring minimal performance degradation while reducing memory and compute overhead. This approach provides a scalable alternative to full attention, enabling the practical deployment of large-scale Large Language Models (LLMs) without sacrificing retrieval performance. DAM is available at: https://github.com/HanzhiZhang-Ulrica/DAM.

2024

pdf bib abs
Analyzing Large Language Models’ Capability in Location Prediction
Zhaomin Xiao | Yan Huang | Eduardo Blanco
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)

In this paper, we investigate and evaluate large language models’ capability in location prediction. We present experimental results with four models—FLAN-T5, FLAN-UL2, FLAN-Alpaca, and ChatGPT—in various instruction finetuning and exemplar settings. We analyze whether taking into account the context—tweets published before and after the tweet mentioning a location—is beneficial. Additionally, we conduct an ablation study to explore whether instruction modification is beneficial. Lastly, our qualitative analysis sheds light on the errors made by the best-performing model.

2023

pdf bib abs
Gender-tuning: Empowering Fine-tuning for Debiasing Pre-trained Language Models
Somayeh Ghanbarzadeh | Yan Huang | Hamid Palangi | Radames Cruz Moreno | Hamed Khanpour
Findings of the Association for Computational Linguistics: ACL 2023

Recent studies have revealed that the widely-used Pre-trained Language Models (PLMs) propagate societal biases from the large unmoderated pre-training corpora. Existing solutions require debiasing training processes and datasets for debiasing, which are resource-intensive and costly. Furthermore, these methods hurt the PLMs’ performance on downstream tasks. In this study, we propose Gender-tuning, which debiases the PLMs through fine-tuning on downstream tasks’ datasets. For this aim, Gender-tuning integrates Masked Language Modeling (MLM) training objectives into fine-tuning’s training process. Comprehensive experiments show that Gender-tuning outperforms the state-of-the-art baselines in terms of average gender bias scores in PLMs while improving PLMs’ performance on downstream tasks solely using the downstream tasks’ dataset. Also, Gender-tuning is a deployable debiasing tool for any PLM that works with original fine-tuning.

pdf bib
Context Helps Determine Spatial Knowledge from Tweets
Zhaomin Xiao | Yan Huang | Eduardo Blanco
Findings of the Association for Computational Linguistics: IJCNLP-AACL 2023 (Findings)

2020

pdf bib abs
Learning Goal-oriented Dialogue Policy with opposite Agent Awareness
Zheng Zhang | Lizi Liao | Xiaoyan Zhu | Tat-Seng Chua | Zitao Liu | Yan Huang | Minlie Huang
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

Most existing approaches for goal-oriented dialogue policy learning used reinforcement learning, which focuses on the target agent policy and simply treats the opposite agent policy as part of the environment. While in real-world scenarios, the behavior of an opposite agent often exhibits certain patterns or underlies hidden policies, which can be inferred and utilized by the target agent to facilitate its own decision making. This strategy is common in human mental simulation by first imaging a specific action and the probable results before really acting it. We therefore propose an opposite behavior aware framework for policy learning in goal-oriented dialogues. We estimate the opposite agent’s policy from its behavior and use this estimation to improve the target agent by regarding it as part of the target policy. We evaluate our model on both cooperative and competitive dialogue tasks, showing superior performance over state-of-the-art baselines.

pdf bib abs
Pointing to Select: A Fast Pointer-LSTM for Long Text Classification
Jinhua Du | Yan Huang | Karo Moilanen
Proceedings of the 28th International Conference on Computational Linguistics

Recurrent neural networks (RNNs) suffer from well-known limitations and complications which include slow inference and vanishing gradients when processing long sequences in text classification. Recent studies have attempted to accelerate RNNs via various ad hoc mechanisms to skip irrelevant words in the input. However, word skipping approaches proposed to date effectively stop at each or a given time step to decide whether or not a given input word should be skipped, breaking the coherence of input processing in RNNs. Furthermore, current methods cannot change skip rates during inference and are consequently unable to support different skip rates in demanding real-world conditions. To overcome these limitations, we propose Pointer- LSTM, a novel LSTM framework which relies on a pointer network to select important words for target prediction. The model maintains a coherent input process for the LSTM modules and makes it possible to change the skip rate during inference. Our evaluation on four public data sets demonstrates that Pointer-LSTM (a) is 1.1x∼3.5x faster than the standard LSTM architecture; (b) is more accurate than Leap-LSTM (the state-of-the-art LSTM skipping model) at high skip rates; and (c) reaches robust accuracy levels even when the skip rate is changed during inference.