2025
pdf
bib
abs
Improving Instruct Models for Free: A Study on Partial Adaptation
Ozan Irsoy
|
Pengxiang Cheng
|
Jennifer L Chen
|
Daniel Preotiuc-Pietro
|
Shiyue Zhang
|
Duccio Pappadopulo
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing
Instruct models, obtained from various instruction tuning or post-training steps, are commonly deemed superior and more usable than their base counterpart. While the model gains instruction following ability, instruction tun- ing may lead to forgetting the knowledge from pre-training or it may encourage the model being overly conversational or verbose. This, in turn, can lead to degradation of in-context few-shot learning performance. In this work, we study the performance trajectory between base and instruct models by scaling down the strength of instruction-tuning via the partial adaption method. We show that, across several model families and model sizes, reducing the strength of instruction-tuning results in material improvement on a few-shot in-context learning benchmark covering a variety of classic natural language tasks. This comes at the cost of losing some degree of instruction following ability as measured by AlpacaEval. Our study shines light on the potential trade-off between in-context learning and instruction following abilities that is worth considering in practice.
2024
pdf
bib
abs
Unsupervised Contrast-Consistent Ranking with Language Models
Niklas Stoehr
|
Pengxiang Cheng
|
Jing Wang
|
Daniel Preotiuc-Pietro
|
Rajarshi Bhowmik
Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)
Language models contain ranking-based knowledge and are powerful solvers of in-context ranking tasks. For instance, they may have parametric knowledge about the ordering of countries by size or may be able to rank product reviews by sentiment. We compare pairwise, pointwise and listwise prompting techniques to elicit a language model’s ranking knowledge. However, we find that even with careful calibration and constrained decoding, prompting-based techniques may not always be self-consistent in the rankings they produce. This motivates us to explore an alternative approach that is inspired by an unsupervised probing method called Contrast-Consistent Search (CCS). The idea is to train a probe guided by a logical constraint: a language model’s representation of a statement and its negation must be mapped to contrastive true-false poles consistently across multiple statements. We hypothesize that similar constraints apply to ranking tasks where all items are related via consistent, pairwise or listwise comparisons. To this end, we extend the binary CCS method to Contrast-Consistent Ranking (CCR) by adapting existing ranking methods such as the Max-Margin Loss, Triplet Loss and an Ordinal Regression objective. Across different models and datasets, our results confirm that CCR probing performs better or, at least, on a par with prompting.
2023
pdf
bib
abs
Overcoming Catastrophic Forgetting in Massively Multilingual Continual Learning
Genta Winata
|
Lingjue Xie
|
Karthik Radhakrishnan
|
Shijie Wu
|
Xisen Jin
|
Pengxiang Cheng
|
Mayank Kulkarni
|
Daniel Preotiuc-Pietro
Findings of the Association for Computational Linguistics: ACL 2023
Real-life multilingual systems should be able to efficiently incorporate new languages as data distributions fed to the system evolve and shift over time. To do this, systems need to handle the issue of catastrophic forgetting, where the model performance drops for languages or tasks seen further in its past. In this paper, we study catastrophic forgetting, as well as methods to minimize this, in a massively multilingual continual learning framework involving up to 51 languages and covering both classification and sequence labeling tasks. We present LR ADJUST, a learning rate scheduling method that is simple, yet effective in preserving new information without strongly overwriting past knowledge. Furthermore, we show that this method is effective across multiple continual learning approaches. Finally, we provide further insights into the dynamics of catastrophic forgetting in this massively multilingual setup.
2018
pdf
bib
abs
Implicit Argument Prediction with Event Knowledge
Pengxiang Cheng
|
Katrin Erk
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)
Implicit arguments are not syntactically connected to their predicates, and are therefore hard to extract. Previous work has used models with large numbers of features, evaluated on very small datasets. We propose to train models for implicit argument prediction on a simple cloze task, for which data can be generated automatically at scale. This allows us to use a neural model, which draws on narrative coherence and entity salience for predictions. We show that our model has superior performance on both synthetic and natural data.
2016
pdf
bib
Representing Meaning with a Combination of Logical and Distributional Models
I. Beltagy
|
Stephen Roller
|
Pengxiang Cheng
|
Katrin Erk
|
Raymond J. Mooney
Computational Linguistics, Volume 42, Issue 4 - December 2016