2024
pdf
bib
abs
Translate-and-Revise: Boosting Large Language Models for Constrained Translation
Pengcheng Huang
|
Yongyu Mu
|
Yuzhang Wu
|
Bei Li
|
Chunyang Xiao
|
Tong Xiao
|
Zhu Jingbo
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
“Imposing constraints on machine translation systems presents a challenging issue because thesesystems are not trained to make use of constraints in generating adequate, fluent translations. Inthis paper, we leverage the capabilities of large language models (LLMs) for constrained trans-lation, given that LLMs can easily adapt to this task by taking translation instructions and con-straints as prompts. However, LLMs cannot always guarantee the adequacy of translation, and,in some cases, ignore the given constraints. This is in part because LLMs might be overly confi-dent in their predictions, overriding the influence of the constraints. To overcome this overidingbehaviour, we propose to add a revision process that encourages LLMs to correct the outputs byprompting them about the constraints that have not yet been met. We evaluate our approach onfour constrained translation tasks, encompassing both lexical and structural constraints in mul-tiple constraint domains. Experiments show 15% improvement in constraint-based translationaccuracy over standard LLMs and the approach also significantly outperforms neural machinetranslation (NMT) state-of-the-art methods.IntroductionConstrained translation seeks to generate translations that adhere to pre-specified constraints. Toachieve this, conventional approaches impose constraints on machine translation systems and force themto follow the constraints during inference (Hokamp and Liu, 2017; Hasler et al., 2018; Dinu et al., 2019;Bergmanis and Pinnis, 2021b; Wang et al., 2022b; Ailem et al., 2022). More recently, large languagemodels (LLMs) have been shown to be strong translation systems (Hendy et al., 2023; Moslem et al.,2023). They provide a general way to involve various instructions, demonstrations, and constraints intothe translation process (Mu et al., 2023; Bogoychev and Chen, 2023), enabling us to perform constrainedtranslation using off-the-shelf, well-trained LLMs.”
pdf
bib
abs
Forgetting Curve: A Reliable Method for Evaluating Memorization Capability for Long-Context Models
Xinyu Liu
|
Runsong Zhao
|
Pengcheng Huang
|
Chunyang Xiao
|
Bei Li
|
Jingang Wang
|
Tong Xiao
|
JingBo Zhu
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Numerous recent works target to extend effective context length for language models and various methods, tasks and benchmarks exist to measure model’s effective memory length. However, through thorough investigations, we find limitations for currently existing evaluations on model’s memory. We provide an extensive survey for limitations in this work and propose a new method called forgetting curve to measure the memorization capability of long-context models. We show that forgetting curve has the advantage of being robust to the tested corpus and the experimental settings, of not relying on prompt and can be applied to any model size. We apply our forgetting curve to a large variety of models involving both transformer and RNN/SSM based architectures. Our measurement provides empirical evidence for the effectiveness of transformer extension techniques while raises questions for the effective length of RNN/SSM based models. We also examine the difference between our measurement and existing benchmarks as well as popular metrics for various models.
2019
pdf
bib
abs
Grammatical Sequence Prediction for Real-Time Neural Semantic Parsing
Chunyang Xiao
|
Christoph Teichmann
|
Konstantine Arkoudas
Proceedings of the Workshop on Deep Learning and Formal Languages: Building Bridges
While sequence-to-sequence (seq2seq) models achieve state-of-the-art performance in many natural language processing tasks, they can be too slow for real-time applications. One performance bottleneck is predicting the most likely next token over a large vocabulary; methods to circumvent this bottleneck are a current research topic. We focus specifically on using seq2seq models for semantic parsing, where we observe that grammars often exist which specify valid formal representations of utterance semantics. By developing a generic approach for restricting the predictions of a seq2seq model to grammatically permissible continuations, we arrive at a widely applicable technique for speeding up semantic parsing. The technique leads to a 74% speed-up on an in-house dataset with a large vocabulary, compared to the same neural model without grammatical restrictions
2016
pdf
bib
Sequence-based Structured Prediction for Semantic Parsing
Chunyang Xiao
|
Marc Dymetman
|
Claire Gardent
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
pdf
bib
Orthogonality regularizer for question answering
Chunyang Xiao
|
Guillaume Bouchard
|
Marc Dymetman
|
Claire Gardent
Proceedings of the Fifth Joint Conference on Lexical and Computational Semantics
2015
pdf
bib
Reversibility reconsidered: finite-state factors for efficient probabilistic sampling in parsing and generation
Marc Dymetman
|
Sriram Venkatapathy
|
Chunyang Xiao
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing