2024
pdf
abs
ICXML: An In-Context Learning Framework for Zero-Shot Extreme Multi-Label Classification
Yaxin Zhu
|
Hamed Zamani
Findings of the Association for Computational Linguistics: NAACL 2024
This paper focuses on the task of Extreme Multi-Label Classification (XMC) whose goal is to predict multiple labels for each instance from an extremely large label space. While existing research has primarily focused on fully supervised XMC, real-world scenarios often lack supervision signals, highlighting the importance of zero-shot settings. Given the large label space, utilizing in-context learning approaches is not trivial. We address this issue by introducing In-Context Extreme Multi-label Learning (ICXML), a two-stage framework that cuts down the search space by generating a set of candidate labels through in-context learning and then reranks them. Extensive experiments suggest that ICXML advances the state of the art on two diverse public benchmarks.
pdf
abs
RAGs to Style: Personalizing LLMs with Style Embeddings
Abhiman Neelakanteswara
|
Shreyas Chaudhari
|
Hamed Zamani
Proceedings of the 1st Workshop on Personalization of Generative AI Systems (PERSONALIZE 2024)
This paper studies the use of style embeddings to enhance author profiling for the goal of personalization of Large Language Models (LLMs). Using a style-based Retrieval-Augmented Generation (RAG) approach, we meticulously study the efficacy of style embeddings in capturing distinctive authorial nuances. The proposed method leverages this acquired knowledge to enhance the personalization capabilities of LLMs. In the assessment of this approach, we have employed the LaMP benchmark, specifically tailored for evaluating language models across diverse dimensions of personalization. The empirical observations from our investigation reveal that, in comparison to term matching or context matching, style proves to be marginally superior in the development of personalized LLMs.
pdf
abs
LaMP: When Large Language Models Meet Personalization
Alireza Salemi
|
Sheshera Mysore
|
Michael Bendersky
|
Hamed Zamani
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
This paper highlights the importance of personalization in large language models and introduces the LaMP benchmark — a novel benchmark for training and evaluating language models for producing personalized outputs. LaMP offers a comprehensive evaluation framework with diverse language tasks and multiple entries for each user profile. It consists of seven personalized tasks, spanning three text classification and four text generation tasks. We additionally propose two retrieval augmentation approaches that retrieve personal items from each user profile for personalizing language model outputs. To this aim, we study various retrieval models, including term matching, semantic matching, and time-aware methods. Extensive experiments on LaMP for zero-shot and fine-tuned language models demonstrate the efficacy of the proposed retrieval augmentation approach and highlight the impact of personalization in various natural language tasks.
2022
pdf
abs
Predicting Prerequisite Relations for Unseen Concepts
Yaxin Zhu
|
Hamed Zamani
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
Concept prerequisite learning (CPL) plays a key role in developing technologies that assist people to learn a new complex topic or concept. Previous work commonly assumes that all concepts are given at training time and solely focuses on predicting the unseen prerequisite relationships between them. However, many real-world scenarios deal with concepts that are left undiscovered at training time, which is relatively unexplored. This paper studies this problem and proposes a novel alternating knowledge distillation approach to take advantage of both content- and graph-based models for this task. Extensive experiments on three public benchmarks demonstrate up to 10% improvements in terms of F1 score.
pdf
abs
You can’t pick your neighbors, or can you? When and How to Rely on Retrieval in the kNN-LM
Andrew Drozdov
|
Shufan Wang
|
Razieh Rahimi
|
Andrew McCallum
|
Hamed Zamani
|
Mohit Iyyer
Findings of the Association for Computational Linguistics: EMNLP 2022
Retrieval-enhanced language models (LMs), which condition their predictions on text retrieved from large external datastores, have recently shown significant perplexity improvements compared to standard LMs. One such approach, the kNN-LM, interpolates any existing LM’s predictions with the output of a k-nearest neighbors model and requires no additional training. In this paper, we explore the importance of lexical and semantic matching in the context of items retrieved by kNN-LM. We find two trends: (1) the presence of large overlapping n-grams between the datastore and evaluation set plays an important factor in strong performance, even when the datastore is derived from the training data; and (2) the kNN-LM is most beneficial when retrieved items have high semantic similarity with the query. Based on our analysis, we define a new formulation of the kNN-LM that uses retrieval quality to assign the interpolation coefficient. We empirically measure the effectiveness of our approach on two English language modeling datasets, Wikitext-103 and PG-19. Our re-formulation of the kNN-LM is beneficial in both cases, and leads to nearly 4% improvement in perplexity on the Wikitext-103 test set.
pdf
abs
DISAPERE: A Dataset for Discourse Structure in Peer Review Discussions
Neha Nayak Kennard
|
Tim O’Gorman
|
Rajarshi Das
|
Akshay Sharma
|
Chhandak Bagchi
|
Matthew Clinton
|
Pranay Kumar Yelugam
|
Hamed Zamani
|
Andrew McCallum
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies
At the foundation of scientific evaluation is the labor-intensive process of peer review. This critical task requires participants to consume vast amounts of highly technical text. Prior work has annotated different aspects of review argumentation, but discourse relations between reviews and rebuttals have yet to be examined. We present DISAPERE, a labeled dataset of 20k sentences contained in 506 review-rebuttal pairs in English, annotated by experts. DISAPERE synthesizes label sets from prior work and extends them to include fine-grained annotation of the rebuttal sentences, characterizing their context in the review and the authors’ stance towards review arguments. Further, we annotate every review and rebuttal sentence. We show that discourse cues from rebuttals can shed light on the quality and interpretation of reviews. Further, an understanding of the argumentative strategies employed by the reviewers and authors provides useful signal for area chairs and other decision makers.
2019
pdf
abs
Multi-step Entity-centric Information Retrieval for Multi-Hop Question Answering
Rajarshi Das
|
Ameya Godbole
|
Dilip Kavarthapu
|
Zhiyu Gong
|
Abhishek Singhal
|
Mo Yu
|
Xiaoxiao Guo
|
Tian Gao
|
Hamed Zamani
|
Manzil Zaheer
|
Andrew McCallum
Proceedings of the 2nd Workshop on Machine Reading for Question Answering
Multi-hop question answering (QA) requires an information retrieval (IR) system that can find multiple supporting evidence needed to answer the question, making the retrieval process very challenging. This paper introduces an IR technique that uses information of entities present in the initially retrieved evidence to learn to ‘hop’ to other relevant evidence. In a setting, with more than 5 million Wikipedia paragraphs, our approach leads to significant boost in retrieval performance. The retrieved evidence also increased the performance of an existing QA model (without any training) on the benchmark by 10.59 F1.
2015
pdf
Multitask Learning for Adaptive Quality Estimation of Automatically Transcribed Utterances
José G. C. de Souza
|
Hamed Zamani
|
Matteo Negri
|
Marco Turchi
|
Daniele Falavigna
Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies