Ian Lane

2026

Improving the Faithfulness of LLM-based Abstractive Summarization with Span-level Unlikelihood Training
Sicong Huang | Qianqi Yan | Shengze Wang | Ian Lane
Proceedings of the 6th Workshop on Trustworthy NLP (TrustNLP 2026)

Abstractive summarization using large language models (LLMs) has become an essential tool for condensing information. Despite their ability to generate fluent summaries, these models often produce texts that are unfaithful to the original documents, manifested through hallucinations of specific words, phrases, or concepts. Current approaches to mitigating unfaithfulness typically involve post-processing corrections or contrastive learning from synthetically generated negative samples, which do not fully address the spectrum of errors that can arise in LLM-generated summaries. In this paper, we introduce a novel approach to fine-tune LLMs specifically to reduce the occurrence of unfaithful spans of text in generated summaries. We first annotate span-level hallucinations in LLM-generated summaries using automatic labeling with GPT-4. We then fine-tune the LLM using both summaries with no hallucinations and spans of hallucinated text to improve the faithfulness of the model. This paper introduces a dataset labeled to distinguish between faithful and unfaithful content and compare the performance of three techniques: gradient ascent, unlikelihood training, and task vector negation. Our experimental results show that unlikelihood training can effectively use span-level annotations to enhance summary faithfulness, reducing the number of summaries with hallucinations from 31% to 13%, a reduction of 58% on the CNN summarization dataset and from 33% to 20%, a reduction of 39% on the SAMSum dataset.

2025

pdf bib abs

UCSC at SemEval-2025 Task 8: Question Answering over Tabular Data
Neng Wan | Sicong Huang | Esha Ubale | Ian Lane
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

Table question answering (Table QA) remains challenging due to the varied structures of tables and the complexity of queries, which often require specialized reasoning. We introduce a system that leverages large language models (LLMs) to generate executable code as an intermediate step for answering questions on tabular data. The methodology uniformly represents tables as dataframes and prompts an LLM to translate natural-language questions into code that can be executed on these tables. This approach addresses key challenges by handling diverse table formats, enhancing interpretability through code execution. Experimental results on the DataBench benchmarks demonstrate that the proposed code-then-execute approach achieves high accuracy. Moreover, by offloading computation to code execution, the system requires fewer LLM invocations, thereby improving efficiency. These findings highlight the effectiveness of an LLM-based coding approach for reliable, scalable, and interpretable Table QA.

pdf bib abs

UCSC at SemEval-2025 Task 3: Context, Models and Prompt Optimization for Automated Hallucination Detection in LLM Output
Sicong Huang | Jincheng He | Shiyuan Huang | Karthik Raja Anandan | Arkajyoti Chakraborty | Ian Lane
Proceedings of the 19th International Workshop on Semantic Evaluation (SemEval-2025)

Hallucinations pose a significant challenge for large language models when answering knowledge-intensive queries. As LLMs become more widely adopted, it is crucial not only to detect if hallucinations occur but also to pinpoint where they arise. SemEval 2025 Task 3, Mu-SHROOM: Multilingual Shared-task on Hallucinations and Related Observable Overgeneration Mistakes, is a recent effort in this direction. This paper describes our solution to the shared task. We propose a framework that first retrieves relevant context, next identifies false content from the answer, and finally maps them back to spans. The process is further enhanced by automatically optimizing prompts. Our system achieves the highest overall performance, ranking #1 in average position across all languages.

2019

pdf bib abs

Understanding and conversing about dynamic scenes is one of the key capabilities of AI agents that navigate the environment and convey useful information to humans. Video question answering is a specific scenario of such AI-human interaction where an agent generates a natural language response to a question regarding the video of a dynamic scene. Incorporating features from multiple modalities, which often provide supplementary information, is one of the challenging aspects of video question answering. Furthermore, a question often concerns only a small segment of the video, hence encoding the entire video sequence using a recurrent neural network is not computationally efficient. Our proposed question-guided video representation module efficiently generates the token-level video summary guided by each word in the question. The learned representations are then fused with the question to generate the answer. Through empirical evaluation on the Audio Visual Scene-aware Dialog (AVSD) dataset, our proposed models in single-turn and multi-turn question answering achieve state-of-the-art performance on several automatic natural language generation evaluation metrics.

2018

pdf bib abs

Adversarial Learning of Task-Oriented Neural Dialog Models
Bing Liu | Ian Lane
Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue

In this work, we propose an adversarial learning method for reward estimation in reinforcement learning (RL) based task-oriented dialog models. Most of the current RL based task-oriented dialog systems require the access to a reward signal from either user feedback or user ratings. Such user ratings, however, may not always be consistent or available in practice. Furthermore, online dialog policy learning with RL typically requires a large number of queries to users, suffering from sample efficiency problem. To address these challenges, we propose an adversarial learning method to learn dialog rewards directly from dialog samples. Such rewards are further used to optimize the dialog policy with policy gradient based RL. In the evaluation in a restaurant search domain, we show that the proposed adversarial dialog learning method achieves advanced dialog success rate comparing to strong baseline methods. We further discuss the covariate shift problem in online adversarial dialog learning and show how we can address that with partial access to user feedback.

pdf bib abs

End-to-End Learning of Task-Oriented Dialogs
Bing Liu | Ian Lane
Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop

In this thesis proposal, we address the limitations of conventional pipeline design of task-oriented dialog systems and propose end-to-end learning solutions. We design neural network based dialog system that is able to robustly track dialog state, interface with knowledge bases, and incorporate structured query results into system responses to successfully complete task-oriented dialog. In learning such neural network based dialog systems, we propose hybrid offline training and online interactive learning methods. We introduce a multi-task learning method in pre-training the dialog agent in a supervised manner using task-oriented dialog corpora. The supervised training agent can further be improved via interacting with users and learning online from user demonstration and feedback with imitation and reinforcement learning. In addressing the sample efficiency issue with online policy learning, we further propose a method by combining the learning-from-user and learning-from-simulation approaches to improve the online interactive learning efficiency.

Viewing machine translation as a structured classification problem has provided a gateway for a host of structured prediction techniques to enter the field. In particular, large-margin structured prediction methods for discriminative training of feature weights, such as the structured perceptron or MIRA, have started to match or exceed the performance of existing methods such as MERT. One issue with structured problems in general is the difficulty in obtaining fully structured labels, e.g., in machine translation, obtaining reference translations or parallel sentence corpora for arbitrary language pairs. Another issue, more specific to the translation domain, is the difficulty in online training of machine translation systems, since existing methods often require bilingual knowledge to correct translation output online. We propose a solution to these two problems, by demonstrating a way to incorporate binary-labeled feedback (i.e., feedback on whether a translation hypothesis is a “good” or understandable one or not), a form of supervision that can be easily integrated in an online manner, into a machine translation framework. Experimental results show marked improvement by incorporating binary feedback on unseen test data, with gains exceeding 5.5 BLEU points.

2011

pdf bib

Context-aware Language Modeling for Conversational Speech Translation
Avneesh Saluja | Ian Lane | Ying Zhang
Proceedings of Machine Translation Summit XIII: Papers

pdf bib

Unsupervised Vocabulary Selection for Domain-Independent Simultaneous Lecture Translation
Paul Maergner | Ian Lane | Alex Waibel
Proceedings of Machine Translation Summit XIII: Papers

pdf bib abs

Unsupervised vocabulary selection for simultaneous lecture translation
Paul Maergner | Kevin Kilgour | Ian Lane | Alex Waibel
Proceedings of the 8th International Workshop on Spoken Language Translation: Papers

In this work, we propose a novel method for vocabulary selection which enables simultaneous speech recognition systems for lectures to automatically adapt to the diverse topics that occur in educational and scientific lectures. Utilizing materials that are available before the lecture begins, such as lecture slides, our proposed framework iteratively searches for related documents on the World Wide Web and generates a lecture-specific vocabulary and language model based on the resulting documents. In this paper, we introduce a novel method for vocabulary selection where we rank vocabulary that occurs in the collected documents based on a relevance score which is calculated using a combination of word features. Vocabulary selection is a critical component for topic adaptation that has typically been overlooked in prior works. On the interACT German-English simultaneous lecture translation system our proposed approach significantly improved vocabulary coverage, reducing the out-of-vocabulary rate on average by 57.0% and up to 84.9%, compared to a lecture-independent baseline. Furthermore, our approach reduced the word error rate by up to 25.3% (on average 13.2% across all lectures), compared to a lectureindependent baseline.

2010

pdf bib

Tools for Collecting Speech Corpora via Mechanical-Turk
Ian Lane | Matthias Eck | Kay Rottmann | Alex Waibel
Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk

pdf bib

Real-time spoken language identification and recognition for speech-to-speech translation
Daniel Chung Yong Lim | Ian Lane | Alex Waibel
Proceedings of the 7th International Workshop on Spoken Language Translation: Papers

2009

pdf bib

2007

pdf bib

Bilingual-LSA Based LM Adaptation for Spoken Language Translation
Yik-Cheung Tam | Ian Lane | Tanja Schultz
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

pdf bib

A Log-Linear Block Transliteration Model based on Bi-Stream HMMs
Bing Zhao | Nguyen Bach | Ian Lane | Stephan Vogel
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

pdf bib

Improving spoken language translation by automatic disfluency removal: evidence from conversational speech transcripts
Sharath Rao | Ian Lane | Tanja Schultz
Proceedings of Machine Translation Summit XI: Papers

pdf bib abs

This paper describes the CMU-UKA statistical machine translation systems submitted to the IWSLT 2007 evaluation campaign. Systems were submitted for three language-pairs: Japanese→English, Chinese→English and Arabic→English. All systems were based on a common phrase-based SMT (statistical machine translation) framework but for each language-pair a specific research problem was tackled. For Japanese→English we focused on two problems: first, punctuation recovery, and second, how to incorporate topic-knowledge into the translation framework. Our Chinese→English submission focused on syntax-augmented SMT and for the Arabic→English task we focused on incorporating morphological-decomposition into the SMT framework. This research strategy enabled us to evaluate a wide variety of approaches which proved effective for the language pairs they were evaluated on.

2006

pdf bib

Venues

ACL1

Ian Lane

2026

2025

2019

2018

2016

2014

2012

2011

2010

2009

2007

2006

Co-authors

Venues