Yuan Zhang


2024

pdf
Can Large Language Models Understand Context?
Yilun Zhu | Joel Ruben Antony Moniz | Shruti Bhargava | Jiarui Lu | Dhivya Piraviperumal | Site Li | Yuan Zhang | Hong Yu | Bo-Hsiang Tseng
Findings of the Association for Computational Linguistics: EACL 2024

Understanding context is key to understanding human language, an ability which Large Language Models (LLMs) have been increasingly seen to demonstrate to an impressive extent. However, though the evaluation of LLMs encompasses various domains within the realm of Natural Language Processing, limited attention has been paid to probing their linguistic capability of understanding contextual features. This paper introduces a context understanding benchmark by adapting existing datasets to suit the evaluation of generative models. This benchmark comprises of four distinct tasks and nine datasets, all featuring prompts designed to assess the models’ ability to understand context. First, we evaluate the performance of LLMs under the in-context learning pretraining scenario. Experimental results indicate that pre-trained dense models struggle with understanding more nuanced contextual features when compared to state-of-the-art fine-tuned models. Second, as LLM compression holds growing significance in both research and real-world applications, we assess the context understanding of quantized models under in-context-learning settings. We find that 3-bit post-training quantization leads to varying degrees of performance reduction on our benchmark. We conduct an extensive analysis of these scenarios to substantiate our experimental results.

pdf
LJPCheck: Functional Tests for Legal Judgment Prediction
Yuan Zhang | Wanhong Huang | Yi Feng | Chuanyi Li | Zhiwei Fei | Jidong Ge | Bin Luo | Vincent Ng
Findings of the Association for Computational Linguistics ACL 2024

Legal Judgment Prediction (LJP) refers to the task of automatically predicting judgment results (e.g., charges, law articles and term of penalty) given the fact description of cases. While SOTA models have achieved high accuracy and F1 scores on public datasets, existing datasets fail to evaluate specific aspects of these models (e.g., legal fairness, which significantly impact their applications in real scenarios). Inspired by functional testing in software engineering, we introduce LJPCHECK, a suite of functional tests for LJP models, to comprehend LJP models’ behaviors and offer diagnostic insights. We illustrate the utility of LJPCHECK on five SOTA LJP models. Extensive experiments reveal vulnerabilities in these models, prompting an in-depth discussion into the underlying reasons of their shortcomings.

2023

pdf
MARRS: Multimodal Reference Resolution System
Halim Cagri Ates | Shruti Bhargava | Site Li | Jiarui Lu | Siddhardha Maddula | Joel Ruben Antony Moniz | Anil Kumar Nalamalapu | Roman Hoang Nguyen | Melis Ozyildirim | Alkesh Patel | Dhivya Piraviperumal | Vincent Renkens | Ankit Samal | Thy Tran | Bo-Hsiang Tseng | Hong Yu | Yuan Zhang | Shirley Zou
Proceedings of The Sixth Workshop on Computational Models of Reference, Anaphora and Coreference (CRAC 2023)

2022

pdf
Mask the Correct Tokens: An Embarrassingly Simple Approach for Error Correction
Kai Shen | Yichong Leng | Xu Tan | Siliang Tang | Yuan Zhang | Wenjie Liu | Edward Lin
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Text error correction aims to correct the errors in text sequences such as those typed by humans or generated by speech recognition models.Previous error correction methods usually take the source (incorrect) sentence as encoder input and generate the target (correct) sentence through the decoder. Since the error rate of the incorrect sentence is usually low (e.g., 10%), the correction model can only learn to correct on limited error tokens but trivially copy on most tokens (correct tokens), which harms the effective training of error correction. In this paper, we argue that the correct tokens should be better utilized to facilitate effective training and then propose a simple yet effective masking strategy to achieve this goal.Specifically, we randomly mask out a part of the correct tokens in the source sentence and let the model learn to not only correct the original error tokens but also predict the masked tokens based on their context information. Our method enjoys several advantages: 1) it alleviates trivial copy; 2) it leverages effective training signals from correct tokens; 3) it is a plug-and-play module and can be applied to different models and tasks. Experiments on spelling error correction and speech recognition error correction on Mandarin datasets and grammar error correction on English datasets with both autoregressive and non-autoregressive generation models show that our method improves the correctionaccuracy consistently.

2021

pdf
QA-Driven Zero-shot Slot Filling with Weak Supervision Pretraining
Xinya Du | Luheng He | Qi Li | Dian Yu | Panupong Pasupat | Yuan Zhang
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Slot-filling is an essential component for building task-oriented dialog systems. In this work, we focus on the zero-shot slot-filling problem, where the model needs to predict slots and their values, given utterances from new domains without training on the target domain. Prior methods directly encode slot descriptions to generalize to unseen slot types. However, raw slot descriptions are often ambiguous and do not encode enough semantic information, limiting the models’ zero-shot capability. To address this problem, we introduce QA-driven slot filling (QASF), which extracts slot-filler spans from utterances with a span-based QA model. We use a linguistically motivated questioning strategy to turn descriptions into questions, allowing the model to generalize to unseen slot types. Moreover, our QASF model can benefit from weak supervision signals from QA pairs synthetically generated from unlabeled conversations. Our full system substantially outperforms baselines by over 5% on the SNIPS benchmark.

pdf
Few-shot Intent Classification and Slot Filling with Retrieved Examples
Dian Yu | Luheng He | Yuan Zhang | Xinya Du | Panupong Pasupat | Qi Li
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Few-shot learning arises in important practical scenarios, such as when a natural language understanding system needs to learn new semantic labels for an emerging, resource-scarce domain. In this paper, we explore retrieval-based methods for intent classification and slot filling tasks in few-shot settings. Retrieval-based methods make predictions based on labeled examples in the retrieval index that are similar to the input, and thus can adapt to new domains simply by changing the index without having to retrain the model. However, it is non-trivial to apply such methods on tasks with a complex label space like slot filling. To this end, we propose a span-level retrieval method that learns similar contextualized representations for spans with the same label via a novel batch-softmax objective. At inference time, we use the labels of the retrieved spans to construct the final structure with the highest aggregated score. Our method outperforms previous systems in various few-shot settings on the CLINC and SNIPS benchmarks.

pdf
Noise Robust Named Entity Understanding for Voice Assistants
Deepak Muralidharan | Joel Ruben Antony Moniz | Sida Gao | Xiao Yang | Justine Kao | Stephen Pulman | Atish Kothari | Ray Shen | Yinying Pan | Vivek Kaul | Mubarak Seyed Ibrahim | Gang Xiang | Nan Dun | Yidan Zhou | Andy O | Yuan Zhang | Pooja Chitkara | Xuan Wang | Alkesh Patel | Kushal Tayal | Roger Zheng | Peter Grasch | Jason D Williams | Lin Li
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Papers

Named Entity Recognition (NER) and Entity Linking (EL) play an essential role in voice assistant interaction, but are challenging due to the special difficulties associated with spoken user queries. In this paper, we propose a novel architecture that jointly solves the NER and EL tasks by combining them in a joint reranking module. We show that our proposed framework improves NER accuracy by up to 3.13% and EL accuracy by up to 3.6% in F1 score. The features used also lead to better accuracies in other natural language understanding tasks, such as domain classification and semantic parsing.

pdf
Controllable Semantic Parsing via Retrieval Augmentation
Panupong Pasupat | Yuan Zhang | Kelvin Guu
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

In practical applications of semantic parsing, we often want to rapidly change the behavior of the parser, such as enabling it to handle queries in a new domain, or changing its predictions on certain targeted queries. While we can introduce new training examples exhibiting the target behavior, a mechanism for enacting such behavior changes without expensive model re-training would be preferable. To this end, we propose ControllAble Semantic Parser via Exemplar Retrieval (CASPER). Given an input query, the parser retrieves related exemplars from a retrieval index, augments them to the query, and then applies a generative seq2seq model to produce an output parse. The exemplars act as a control mechanism over the generic generative model: by manipulating the retrieval index or how the augmented query is constructed, we can manipulate the behavior of the parser. On the MTOP dataset, in addition to achieving state-of-the-art on the standard setup, we show that CASPER can parse queries in a new domain, adapt the prediction toward the specified patterns, or adapt to new semantic schemas without having to further re-train the model.

2020

pdf
Mapping Natural Language Instructions to Mobile UI Action Sequences
Yang Li | Jiacong He | Xin Zhou | Yuan Zhang | Jason Baldridge
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

We present a new problem: grounding natural language instructions to mobile user interface actions, and create three new datasets for it. For full task evaluation, we create PixelHelp, a corpus that pairs English instructions with actions performed by people on a mobile UI emulator. To scale training, we decouple the language and action data by (a) annotating action phrase spans in How-To instructions and (b) synthesizing grounded descriptions of actions for mobile user interfaces. We use a Transformer to extract action phrase tuples from long-range natural language instructions. A grounding Transformer then contextually represents UI objects using both their content and screen position and connects them to object descriptions. Given a starting screen and instruction, our model achieves 70.59% accuracy on predicting complete ground-truth action sequences in PixelHelp.

pdf
LIT Team’s System Description for Japanese-Chinese Machine Translation Task in IWSLT 2020
Yimeng Zhuang | Yuan Zhang | Lijie Wang
Proceedings of the 17th International Conference on Spoken Language Translation

This paper describes the LIT Team’s submission to the IWSLT2020 open domain translation task, focusing primarily on Japanese-to-Chinese translation direction. Our system is based on the organizers’ baseline system, but we do more works on improving the Transform baseline system by elaborate data pre-processing. We manage to obtain significant improvements, and this paper aims to share some data processing experiences in this translation task. Large-scale back-translation on monolingual corpus is also investigated. In addition, we also try shared and exclusive word embeddings, compare different granularity of tokens like sub-word level. Our Japanese-to-Chinese translation system achieves a performance of BLEU=34.0 and ranks 2nd among all participating systems.

2019

pdf
Large-Scale Representation Learning from Visually Grounded Untranscribed Speech
Gabriel Ilharco | Yuan Zhang | Jason Baldridge
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

Systems that can associate images with their spoken audio captions are an important step towards visually grounded language learning. We describe a scalable method to automatically generate diverse audio for image captioning datasets. This supports pretraining deep networks for encoding both audio and images, which we do via a dual encoder that learns to align latent representations from both modalities. We show that a masked margin softmax loss for such models is superior to the standard triplet loss. We fine-tune these models on the Flickr8k Audio Captions Corpus and obtain state-of-the-art results—improving recall in the top 10 from 29.6% to 49.5%. We also obtain human ratings on retrieval outputs to better assess the impact of incidentally matching image-caption pairs that were not associated in the data, finding that automatic evaluation substantially underestimates the quality of the retrieved results.

pdf
PAWS-X: A Cross-lingual Adversarial Dataset for Paraphrase Identification
Yinfei Yang | Yuan Zhang | Chris Tar | Jason Baldridge
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Most existing work on adversarial data generation focuses on English. For example, PAWS (Paraphrase Adversaries from Word Scrambling) consists of challenging English paraphrase identification pairs from Wikipedia and Quora. We remedy this gap with PAWS-X, a new dataset of 23,659 human translated PAWS evaluation pairs in six typologically distinct languages: French, Spanish, German, Chinese, Japanese, and Korean. We provide baseline numbers for three models with different capacity to capture non-local context and sentence structure, and using different multilingual training and evaluation regimes. Multilingual BERT fine-tuned on PAWS English plus machine-translated data performs the best, with a range of 83.1-90.8 accuracy across the non-English languages and an average accuracy gain of 23% over the next best model. PAWS-X shows the effectiveness of deep, multilingual pre-training while also leaving considerable headroom as a new challenge to drive multilingual research that better captures structure and contextual information.

pdf
Tree Communication Models for Sentiment Analysis
Yuan Zhang | Yue Zhang
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

Tree-LSTMs have been used for tree-based sentiment analysis over Stanford Sentiment Treebank, which allows the sentiment signals over hierarchical phrase structures to be calculated simultaneously. However, traditional tree-LSTMs capture only the bottom-up dependencies between constituents. In this paper, we propose a tree communication model using graph convolutional neural network and graph recurrent neural network, which allows rich information exchange between phrases constituent tree. Experiments show that our model outperforms existing work on bidirectional tree-LSTMs in both accuracy and efficiency, providing more consistent predictions on phrase-level sentiments.

pdf
PAWS: Paraphrase Adversaries from Word Scrambling
Yuan Zhang | Jason Baldridge | Luheng He
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Existing paraphrase identification datasets lack sentence pairs that have high lexical overlap without being paraphrases. Models trained on such data fail to distinguish pairs like flights from New York to Florida and flights from Florida to New York. This paper introduces PAWS (Paraphrase Adversaries from Word Scrambling), a new dataset with 108,463 well-formed paraphrase and non-paraphrase pairs with high lexical overlap. Challenging pairs are generated by controlled word swapping and back translation, followed by fluency and paraphrase judgments by human raters. State-of-the-art models trained on existing datasets have dismal performance on PAWS (<40% accuracy); however, including PAWS training data for these models improves their accuracy to 85% while maintaining performance on existing tasks. In contrast, models that do not capture non-local contextual information fail even with PAWS training examples. As such, PAWS provides an effective instrument for driving further progress on models that better exploit structure, context, and pairwise comparisons.

2018

pdf
Points, Paths, and Playscapes: Large-scale Spatial Language Understanding Tasks Set in the Real World
Jason Baldridge | Tania Bedrax-Weiss | Daphne Luong | Srini Narayanan | Bo Pang | Fernando Pereira | Radu Soricut | Michael Tseng | Yuan Zhang
Proceedings of the First International Workshop on Spatial Language Understanding

Spatial language understanding is important for practical applications and as a building block for better abstract language understanding. Much progress has been made through work on understanding spatial relations and values in images and texts as well as on giving and following navigation instructions in restricted domains. We argue that the next big advances in spatial language understanding can be best supported by creating large-scale datasets that focus on points and paths based in the real world, and then extending these to create online, persistent playscapes that mix human and bot players, where the bot players must learn, evolve, and survive according to their depth of understanding of scenes, navigation, and interactions.

pdf
A Fast, Compact, Accurate Model for Language Identification of Codemixed Text
Yuan Zhang | Jason Riesa | Daniel Gillick | Anton Bakalov | Jason Baldridge | David Weiss
Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing

We address fine-grained multilingual language identification: providing a language code for every token in a sentence, including codemixed text containing multiple languages. Such text is prevalent online, in documents, social media, and message boards. We show that a feed-forward network with a simple globally constrained decoder can accurately and rapidly label both codemixed and monolingual text in 100 languages and 100 language pairs. This model outperforms previously published multilingual approaches in terms of both accuracy and speed, yielding an 800x speed-up and a 19.5% averaged absolute gain on three codemixed datasets. It furthermore outperforms several benchmark systems on monolingual language identification.

2017

pdf
Aspect-augmented Adversarial Networks for Domain Adaptation
Yuan Zhang | Regina Barzilay | Tommi Jaakkola
Transactions of the Association for Computational Linguistics, Volume 5

We introduce a neural method for transfer learning between two (source and target) classification tasks or aspects over the same domain. Rather than training on target labels, we use a few keywords pertaining to source and target aspects indicating sentence relevance instead of document class labels. Documents are encoded by learning to embed and softly select relevant sentences in an aspect-dependent manner. A shared classifier is trained on the source encoded documents and labels, and applied to target encoded documents. We ensure transfer through aspect-adversarial training so that encoded documents are, as sets, aspect-invariant. Experimental results demonstrate that our approach outperforms different baselines and model variants on two datasets, yielding an improvement of 27% on a pathology dataset and 5% on a review dataset.

2016

pdf
Stack-propagation: Improved Representation Learning for Syntax
Yuan Zhang | David Weiss
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
Ten Pairs to Tag – Multilingual POS Tagging via Coarse Mapping between Embeddings
Yuan Zhang | David Gaddy | Regina Barzilay | Tommi Jaakkola
Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2015

pdf
Hierarchical Low-Rank Tensors for Multilingual Transfer Parsing
Yuan Zhang | Regina Barzilay
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf
Randomized Greedy Inference for Joint Segmentation, POS Tagging and Dependency Parsing
Yuan Zhang | Chengtao Li | Regina Barzilay | Kareem Darwish
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

pdf
High-Order Low-Rank Tensors for Semantic Role Labeling
Tao Lei | Yuan Zhang | Lluís Màrquez | Alessandro Moschitti | Regina Barzilay
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2014

pdf
Greed is Good if Randomized: New Inference for Dependency Parsing
Yuan Zhang | Tao Lei | Regina Barzilay | Tommi Jaakkola
Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP)

pdf
Steps to Excellence: Simple Inference with Refined Scoring of Dependency Trees
Yuan Zhang | Tao Lei | Regina Barzilay | Tommi Jaakkola | Amir Globerson
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
Low-Rank Tensors for Scoring Dependency Structures
Tao Lei | Yu Xin | Yuan Zhang | Regina Barzilay | Tommi Jaakkola
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2013

pdf
Transfer Learning for Constituency-Based Grammars
Yuan Zhang | Regina Barzilay | Amir Globerson
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2012

pdf
Learning to Map into a Universal POS Tagset
Yuan Zhang | Roi Reichart | Regina Barzilay | Amir Globerson
Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning