Li Zhang


2022

pdf
Is “My Favorite New Movie” My Favorite Movie? Probing the Understanding of Recursive Noun Phrases
Qing Lyu | Zheng Hua | Daoxin Li | Li Zhang | Marianna Apidianaki | Chris Callison-Burch
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Recursive noun phrases (NPs) have interesting semantic properties. For example, “my favorite new movie” is not necessarily my favorite movie, whereas “my new favorite movie” is. This is common sense to humans, yet it is unknown whether language models have such knowledge. We introduce the Recursive Noun Phrase Challenge (RNPC), a dataset of three textual inference tasks involving textual entailment and event plausibility comparison, precisely targeting the understanding of recursive NPs. When evaluated on RNPC, state-of-the-art Transformer models only perform around chance. Still, we show that such knowledge is learnable with appropriate data. We further probe the models for relevant linguistic features that can be learned from our tasks, including modifier semantic category and modifier scope. Finally, models trained on RNPC achieve strong zero-shot performance on an extrinsic Harm Detection evaluation task, showing the usefulness of the understanding of recursive NPs in downstream applications.

pdf
Label Definitions Improve Semantic Role Labeling
Li Zhang | Ishan Jindal | Yunyao Li
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Argument classification is at the core of Semantic Role Labeling. Given a sentence and the predicate, a semantic role label is assigned to each argument of the predicate. While semantic roles come with meaningful definitions, existing work has treated them as symbolic. Learning symbolic labels usually requires ample training data, which is frequently unavailable due to the cost of annotation. We instead propose to retrieve and leverage the definitions of these labels from the annotation guidelines. For example, the verb predicate “work” has arguments defined as “worker”, “job”, “employer”, etc. Our model achieves state-of-the-art performance on the CoNLL09 dataset injected with label definitions given the predicate senses. The performance improvement is even more pronounced in low-resource settings when training data is scarce.

pdf
Show Me More Details: Discovering Hierarchies of Procedures from Semi-structured Web Data
Shuyan Zhou | Li Zhang | Yue Yang | Qing Lyu | Pengcheng Yin | Chris Callison-Burch | Graham Neubig
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Procedures are inherently hierarchical. To “make videos”, one may need to “purchase a camera”, which in turn may require one to “set a budget”. While such hierarchical knowledge is critical for reasoning about complex procedures, most existing work has treated procedures as shallow structures without modeling the parent-child relation. In this work, we attempt to construct an open-domain hierarchical knowledge-base (KB) of procedures based on wikiHow, a website containing more than 110k instructional articles, each documenting the steps to carry out a complex procedure. To this end, we develop a simple and efficient method that links steps (e.g., “purchase a camera”) in an article to other articles with similar goals (e.g., “how to choose a camera”), recursively constructing the KB. Our method significantly outperforms several strong baselines according to automatic evaluation, human judgment, and application to downstream tasks such as instructional video retrieval.

pdf
Unsupervised Entity Linking with Guided Summarization and Multiple-Choice Selection
Young Min Cho | Li Zhang | Chris Callison-Burch
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Entity linking, the task of linking potentially ambiguous mentions in texts to corresponding knowledge-base entities, is an important component for language understanding. We address two challenge in entity linking: how to leverage wider contexts surrounding a mention, and how to deal with limited training data. We propose a fully unsupervised model called SumMC that first generates a guided summary of the contexts conditioning on the mention, and then casts the task to a multiple-choice problem where the model chooses an entity from a list of candidates. In addition to evaluating our model on existing datasets that focus on named entities, we create a new dataset that links noun phrases from WikiHow to Wikidata. We show that our SumMC model achieves state-of-the-art unsupervised performance on our new dataset and on exiting datasets.

2021

pdf
Goal-Oriented Script Construction
Qing Lyu | Li Zhang | Chris Callison-Burch
Proceedings of the 14th International Conference on Natural Language Generation

The knowledge of scripts, common chains of events in stereotypical scenarios, is a valuable asset for task-oriented natural language understanding systems. We propose the Goal-Oriented Script Construction task, where a model produces a sequence of steps to accomplish a given goal. We pilot our task on the first multilingual script learning dataset supporting 18 languages collected from wikiHow, a website containing half a million how-to articles. For baselines, we consider both a generation-based approach using a language model and a retrieval-based approach by first retrieving the relevant steps from a large candidate pool and then ordering them. We show that our task is practical, feasible but challenging for state-of-the-art Transformer models, and that our methods can be readily deployed for various other datasets and domains with decent zero-shot performance.

pdf
Visual Goal-Step Inference using wikiHow
Yue Yang | Artemis Panagopoulou | Qing Lyu | Li Zhang | Mark Yatskar | Chris Callison-Burch
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

Understanding what sequence of steps are needed to complete a goal can help artificial intelligence systems reason about human activities. Past work in NLP has examined the task of goal-step inference for text. We introduce the visual analogue. We propose the Visual Goal-Step Inference (VGSI) task, where a model is given a textual goal and must choose which of four images represents a plausible step towards that goal. With a new dataset harvested from wikiHow consisting of 772,277 images representing human actions, we show that our task is challenging for state-of-the-art multimodal models. Moreover, the multimodal representation learned from our data can be effectively transferred to other datasets like HowTo100m, increasing the VGSI accuracy by 15 - 20%. Our task will facilitate multimodal reasoning about procedural events.

pdf
Complementary Evidence Identification in Open-Domain Question Answering
Xiangyang Mou | Mo Yu | Shiyu Chang | Yufei Feng | Li Zhang | Hui Su
Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume

This paper proposes a new problem of complementary evidence identification for open-domain question answering (QA). The problem aims to efficiently find a small set of passages that covers full evidence from multiple aspects as to answer a complex question. To this end, we proposes a method that learns vector representations of passages and models the sufficiency and diversity within the selected set, in addition to the relevance between the question and passages. Our experiments demonstrate that our method considers the dependence within the supporting evidence and significantly improves the accuracy of complementary evidence selection in QA domain.

pdf
Multi-Level Gazetteer-Free Geocoding
Sayali Kulkarni | Shailee Jain | Mohammad Javad Hosseini | Jason Baldridge | Eugene Ie | Li Zhang
Proceedings of Second International Combined Workshop on Spatial Language Understanding and Grounded Communication for Robotics

We present a multi-level geocoding model (MLG) that learns to associate texts to geographic coordinates. The Earth’s surface is represented using space-filling curves that decompose the sphere into a hierarchical grid. MLG balances classification granularity and accuracy by combining losses across multiple levels and jointly predicting cells at different levels simultaneously. It obtains large gains without any gazetteer metadata, demonstrating that it can effectively learn the connection between text spans and coordinates—and thus makes it a gazetteer-free geocoder. Furthermore, MLG obtains state-of-the-art results for toponym resolution on three English datasets without any dataset-specific tuning.

2020

pdf
Intent Detection with WikiHow
Li Zhang | Qing Lyu | Chris Callison-Burch
Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing

Modern task-oriented dialog systems need to reliably understand users’ intents. Intent detection is even more challenging when moving to new domains or new languages, since there is little annotated data. To address this challenge, we present a suite of pretrained intent detection models which can predict a broad range of intended goals from many actions because they are trained on wikiHow, a comprehensive instructional website. Our models achieve state-of-the-art results on the Snips dataset, the Schema-Guided Dialogue dataset, and all 3 languages of the Facebook multilingual dialog datasets. Our models also demonstrate strong zero- and few-shot performance, reaching over 75% accuracy using only 100 training examples in all datasets.

pdf
SmartCiteCon: Implicit Citation Context Extraction from Academic Literature Using Supervised Learning
Chenrui Guo | Haoran Cui | Li Zhang | Jiamin Wang | Wei Lu | Jian Wu
Proceedings of the 8th International Workshop on Mining Scientific Publications

We introduce SmartCiteCon (SCC), a Java API for extracting both explicit and implicit citation context from academic literature in English. The tool is built on a Support Vector Machine (SVM) model trained on a set of 7,058 manually annotated citation context sentences, curated from 34,000 papers from the ACL Anthology. The model with 19 features achieves F1=85.6%. SCC supports PDF, XML, and JSON files out-of-box, provided that they are conformed to certain schemas. The API supports single document processing and batch processing in parallel. It takes about 12–45 seconds on average depending on the format to process a document on a dedicated server with 6 multithreaded cores. Using SCC, we extracted 11.8 million citation context sentences from ~33.3k PMC papers in the CORD-19 dataset, released on June 13, 2020. We will provide continuous supplementary data contribution to the CORD-19 and other datasets. The source code is released at https://gitee.com/irlab/SmartCiteCon.

pdf
Small but Mighty: New Benchmarks for Split and Rephrase
Li Zhang | Huaiyu Zhu | Siddhartha Brahma | Yunyao Li
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Split and Rephrase is a text simplification task of rewriting a complex sentence into simpler ones. As a relatively new task, it is paramount to ensure the soundness of its evaluation benchmark and metric. We find that the widely used benchmark dataset universally contains easily exploitable syntactic cues caused by its automatic generation process. Taking advantage of such cues, we show that even a simple rule-based model can perform on par with the state-of-the-art model. To remedy such limitations, we collect and release two crowdsourced benchmark datasets. We not only make sure that they contain significantly more diverse syntax, but also carefully control for their quality according to a well-defined set of criteria. While no satisfactory automatic metric exists, we apply fine-grained manual evaluation based on these criteria using crowdsourcing, showing that our datasets better represent the task and are significantly more challenging for the models.

pdf
Reasoning about Goals, Steps, and Temporal Ordering with WikiHow
Li Zhang | Qing Lyu | Chris Callison-Burch
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

We propose a suite of reasoning tasks on two types of relations between procedural events: goal-step relations (“learn poses” is a step in the larger goal of “doing yoga”) and step-step temporal relations (“buy a yoga mat” typically precedes “learn poses”). We introduce a dataset targeting these two relations based on wikiHow, a website of instructional how-to articles. Our human-validated test set serves as a reliable benchmark for common-sense inference, with a gap of about 10% to 20% between the performance of state-of-the-art transformer models and human performance. Our automatically-generated training set allows models to effectively transfer to out-of-domain tasks requiring knowledge of procedural events, with greatly improved performances on SWAG, Snips, and Story Cloze Test in zero- and few-shot settings.

2019

pdf
Multi-Label Transfer Learning for Multi-Relational Semantic Similarity
Li Zhang | Steven Wilson | Rada Mihalcea
Proceedings of the Eighth Joint Conference on Lexical and Computational Semantics (*SEM 2019)

Multi-relational semantic similarity datasets define the semantic relations between two short texts in multiple ways, e.g., similarity, relatedness, and so on. Yet, all the systems to date designed to capture such relations target one relation at a time. We propose a multi-label transfer learning approach based on LSTM to make predictions for several relations simultaneously and aggregate the losses to update the parameters. This multi-label regression approach jointly learns the information provided by the multiple relations, rather than treating them as separate tasks. Not only does this approach outperform the single-task approach and the traditional multi-task learning approach, but it also achieves state-of-the-art performance on all but one relation of the Human Activity Phrase dataset.

pdf
CAN: Constrained Attention Networks for Multi-Aspect Sentiment Analysis
Mengting Hu | Shiwan Zhao | Li Zhang | Keke Cai | Zhong Su | Renhong Cheng | Xiaowei Shen
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)

Aspect level sentiment classification is a fine-grained sentiment analysis task. To detect the sentiment towards a particular aspect in a sentence, previous studies have developed various attention-based methods for generating aspect-specific sentence representations. However, the attention may inherently introduce noise and downgrade the performance. In this paper, we propose constrained attention networks (CAN), a simple yet effective solution, to regularize the attention for multi-aspect sentiment analysis, which alleviates the drawback of the attention mechanism. Specifically, we introduce orthogonal regularization on multiple aspects and sparse regularization on each single aspect. Experimental results on two public datasets demonstrate the effectiveness of our approach. We further extend our approach to multi-task settings and outperform the state-of-the-art methods.

2018

pdf
Improving Text-to-SQL Evaluation Methodology
Catherine Finegan-Dollak | Jonathan K. Kummerfeld | Li Zhang | Karthik Ramanathan | Sesh Sadasivam | Rui Zhang | Dragomir Radev
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

To be informative, an evaluation must measure how well systems generalize to realistic unseen data. We identify limitations of and propose improvements to current evaluations of text-to-SQL systems. First, we compare human-generated and automatically generated questions, characterizing properties of queries necessary for real-world applications. To facilitate evaluation on multiple datasets, we release standardized and improved versions of seven existing datasets and one new text-to-SQL dataset. Second, we show that the current division of data into training and test sets measures robustness to variations in the way questions are asked, but only partially tests how well systems generalize to new queries; therefore, we propose a complementary dataset split for evaluation of future work. Finally, we demonstrate how the common practice of anonymizing variables during evaluation removes an important challenge of the task. Our observations highlight key difficulties, and our methodology enables effective measurement of future development.

2012

pdf
Affect Detection from Semantic Interpretation of Drama Improvisation
Li Zhang | Ming Jiang
Proceedings of COLING 2012: Posters

2010

pdf
Metaphor Interpretation and Context-based Affect Detection
Li Zhang
Coling 2010: Posters

2007

pdf
Don’t worry about metaphor: affect detection for conversational agents
Catherine Smith | Timothy Rumbell | John Barnden | Robert Hendley | Mark Lee | Alan Wallington | Li Zhang
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

2006

pdf
Developments in Affect Detection in E-drama
Li Zhang | John A. Barnden | Robert J. Hendley | Alan M. Wallington
Demonstrations

pdf
Exploitation in Affect Detection in Open-Ended Improvisational Text
Li Zhang | John A. Barnden | Robert J. Hendley | Alan M. Wallington
Proceedings of the Workshop on Sentiment and Subjectivity in Text

pdf
Empirical Study on the Performance Stability of Named Entity Recognition Model across Domains
Hong Lei Guo | Li Zhang | Zhong Su
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing