Tal Baumel


2025

pdf bib
In-Context Learning on a Budget: A Case Study in Token Classification
Uri Berger | Tal Baumel | Gabriel Stanovsky
The Sixth Workshop on Insights from Negative Results in NLP

Few shot in-context learning (ICL) typically assumes access to large annotated training sets. However, in many real world scenarios, such as domain adaptation, there is only a limited budget to annotate a small number of samples, with the goal of maximizing downstream performance. We study various methods for selecting samples to annotate within a predefined budget, focusing on token classification tasks, which are expensive to annotate and are relatively less studied in ICL setups. Across various tasks, models, and datasets, we observe that no method significantly outperforms the others, with most yielding similar results, including random sample selection for annotation. Moreover, we demonstrate that a relatively small annotated sample pool can achieve performance comparable to using the entire training set. We hope that future work adopts our realistic paradigm which takes annotation budget into account.

2019

pdf bib
Question Answering as an Automatic Evaluation Metric for News Article Summarization
Matan Eyal | Tal Baumel | Michael Elhadad
Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)

Recent work in the field of automatic summarization and headline generation focuses on maximizing ROUGE scores for various news datasets. We present an alternative, extrinsic, evaluation metric for this task, Answering Performance for Evaluation of Summaries. APES utilizes recent progress in the field of reading-comprehension to quantify the ability of a summary to answer a set of manually created questions regarding central entities in the source article. We first analyze the strength of this metric by comparing it to known manual evaluation metrics. We then present an end-to-end neural abstractive model that maximizes APES, while increasing ROUGE scores to competitive results.

2016

pdf bib
Sentence Embedding Evaluation Using Pyramid Annotation
Tal Baumel | Raphael Cohen | Michael Elhadad
Proceedings of the 1st Workshop on Evaluating Vector-Space Representations for NLP

2014

pdf bib
Query-Chain Focused Summarization
Tal Baumel | Raphael Cohen | Michael Elhadad
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)