Ivan Lee


2025

pdf bib
Optimizing Hidden Markov Language Models: An Empirical Study of Reparameterization and Initialization Techniques
Ivan Lee | Taylor Berg-Kirkpatrick
Findings of the Association for Computational Linguistics: NAACL 2025

Hidden Markov models (HMMs) are valuable for their ability to provide exact and tractable inference. However, learning an HMM in an unsupervised manner involves a non-convex optimization problem that is plagued by poor local optima. Recent work on scaling-up HMMs to perform competitively as language models has indicated that this challenge only increases with larger hidden state sizes. Several techniques to address this problem have been proposed, but have not be evaluated comprehensively. This study provides a comprehensive empirical analysis of two recent strategies that use neural networks to enhance HMM optimization: neural reparameterization and neural initialization. We find that (1) these techniques work effectively for scaled HMM language modeling, (2) linear reparameterizations can be as effective as non-linear ones, and (3) the strategies are complementary.

2022

pdf bib
Masked Measurement Prediction: Learning to Jointly Predict Quantities and Units from Textual Context
Daniel Spokoyny | Ivan Lee | Zhao Jin | Taylor Berg-Kirkpatrick
Findings of the Association for Computational Linguistics: NAACL 2022

Physical measurements constitute a large portion of numbers in academic papers, engineering reports, and web tables. Current benchmarks fall short of properly evaluating numeracy of pretrained language models on measurements, hindering research on developing new methods and applying them to numerical tasks. To that end, we introduce a novel task, Masked Measurement Prediction (MMP), where a model learns to reconstruct a number together with its associated unit given masked text. MMP is useful for both training new numerically informed models as well as evaluating numeracy of existing systems. To address this task, we introduce a new Generative Masked Measurement (GeMM) model that jointly learns to predict numbers along with their units. We perform fine-grained analyses comparing our model with various ablations and baselines. We use linear probing of traditional pretrained transformer models (RoBERTa) to show that they significantly underperform jointly trained number-unit models, highlighting the difficulty of this new task and the benefits of our proposed pretraining approach. We hope this framework accelerates the progress towards building more robust numerical reasoning systems in the future.

pdf bib
HeLo: Learning-Free Lookahead Decoding for Conversation Infilling
Ivan Lee | Taylor Berg-Kirkpatrick
Findings of the Association for Computational Linguistics: EMNLP 2022

We propose Heuristic Guided Lookahead Decoding (HeLo), a novel decoding strategy for conversation infilling. Conversation infilling aims to generate a seamless bridge of utterances connecting a given pair of source and target utterances. HeLo does not require fine-tuning or extra models – only the generating model itself. Instead, HeLo leverages a greedy lookahead phase before committing to any token. The HeLo framework is simple and can augment conventional decoding strategies paired with any autoregressive language model. Smooth transitions between utterances are encouraged with an annealing schedule. Our experiments show HeLo outperforms several baselines when evaluated with both automatic and human evaluation metrics, which, we argue, are appropriate for the task.