Yosi Mass
2026
Fine-Grained Detection of Context-Grounded Hallucinations Using LLMs
Yehonatan Peisakhovsky | Zorik Gekhman | Yosi Mass | Liat Ein-Dor | Roi Reichart
Findings of the Association for Computational Linguistics: ACL 2026
Yehonatan Peisakhovsky | Zorik Gekhman | Yosi Mass | Liat Ein-Dor | Roi Reichart
Findings of the Association for Computational Linguistics: ACL 2026
Context-grounded hallucinations are cases where model outputs contain information not verifiable against the source text. We study the applicability of LLMs for localizing such hallucinations, as a more practical alternative to existing complex evaluation pipelines. In the absence of established benchmarks for meta-evaluation of hallucinations localization, we construct one tailored to LLMs, involving a challenging human annotation of over 1,000 examples. We complement the benchmark with an LLM-based evaluation protocol, verifying its quality in a human evaluation. Since existing representations of hallucinations limit the types of errors that can be expressed, we propose a new representation based on free-form textual descriptions, capturing the full range of possible errors. We conduct a comprehensive study, evaluating four large-scale LLMs, which highlights the benchmark’s difficulty, as the best model achieves an F1 score of only 0.67. Through careful analysis, we offer insights into optimal prompting strategies for the task and identify the main factors that make it challenging for LLMs: (1) a tendency to incorrectly flag missing details as inconsistent, despite being instructed to check only facts in the output; and (2) difficulty with outputs containing factually correct information absent from the source - and thus not verifiable - due to alignment with the model’s parametric knowledge.
Will it Merge? On The Causes of Model Mergeability
Adir Rahamim | Asaf Yehudai | Boaz Carmeli | Leshem Choshen | Yosi Mass | Yonatan Belinkov
Findings of the Association for Computational Linguistics: ACL 2026
Adir Rahamim | Asaf Yehudai | Boaz Carmeli | Leshem Choshen | Yosi Mass | Yonatan Belinkov
Findings of the Association for Computational Linguistics: ACL 2026
Model merging has emerged as a promising technique for combining multiple fine-tuned models into a single multitask model without retraining. However, the factors that determine whether merging will succeed or fail remain poorly understood. In this work, we investigate why specific models are merged better than others. To do so, we propose a concrete, measurable definition of mergeability. We investigate several potential causes for high or low mergeability, highlighting the base model knowledge as a dominant factor: Models fine-tuned on instances that the base model knows better are more mergeable than models fine-tuned on instances that the base model struggles with. Based on our mergeability definition, we explore a simple weighted merging technique that better preserves weak knowledge in the base model.
2024
More Bang for your Context: Virtual Documents for Question Answering over Long Documents
Yosi Mass | Boaz Carmeli | Asaf Yehudai | Assaf Toledo | Nathaniel Mills
Findings of the Association for Computational Linguistics: EMNLP 2024
Yosi Mass | Boaz Carmeli | Asaf Yehudai | Assaf Toledo | Nathaniel Mills
Findings of the Association for Computational Linguistics: EMNLP 2024
We deal with the problem of Question Answering (QA) over a long document, which poses a challenge for modern Large Language Models (LLMs). Although LLMs can handle increasingly longer context windows, they struggle to effectively utilize the long content. To address this issue, we introduce the concept of a virtual document (VDoc). A VDoc is created by selecting chunks from the original document that are most likely to contain the information needed to answer the user’s question, while ensuring they fit within the LLM’s context window. We hypothesize that providing a short and focused VDoc to the LLM is more effective than filling the entire context window with less relevant information. Our experiments confirm this hypothesis and demonstrate that using VDocs improves results on the QA task.
2022
Conversational Search with Mixed-Initiative - Asking Good Clarification Questions backed-up by Passage Retrieval
Yosi Mass | Doron Cohen | Asaf Yehudai | David Konopnicki
Proceedings of the Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering
Yosi Mass | Doron Cohen | Asaf Yehudai | David Konopnicki
Proceedings of the Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering
We deal with the scenario of conversational search, where user queries are under-specified or ambiguous. This calls for a mixed-initiative setup. User-asks (queries) and system-answers, as well as system-asks (clarification questions) and user response, in order to clarify her information needs. We focus on the task of selecting the next clarification question, given conversation context. Our method leverages passage retrieval from background content to fine-tune two deep-learning models for ranking candidate clarification questions. We evaluated our method on two different use-cases. The first is an open domain conversational search in a large web collection. The second is a task-oriented customer-support setup. We show that our method performs well on both use-cases.
2020
Agent Assist through Conversation Analysis
Kshitij Fadnis | Nathaniel Mills | Jatin Ganhotra | Haggai Roitman | Gaurav Pandey | Doron Cohen | Yosi Mass | Shai Erera | Chulaka Gunasekara | Danish Contractor | Siva Patel | Q. Vera Liao | Sachindra Joshi | Luis Lastras | David Konopnicki
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Kshitij Fadnis | Nathaniel Mills | Jatin Ganhotra | Haggai Roitman | Gaurav Pandey | Doron Cohen | Yosi Mass | Shai Erera | Chulaka Gunasekara | Danish Contractor | Siva Patel | Q. Vera Liao | Sachindra Joshi | Luis Lastras | David Konopnicki
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations
Customer support agents play a crucial role as an interface between an organization and its end-users. We propose CAIRAA: Conversational Approach to Information Retrieval for Agent Assistance, to reduce the cognitive workload of support agents who engage with users through conversation systems. CAIRAA monitors an evolving conversation and recommends both responses and URLs of documents the agent can use in replies to their client. We combine traditional information retrieval (IR) approaches with more recent Deep Learning (DL) models to ensure high accuracy and efficient run-time performance in the deployed system. Here, we describe the CAIRAA system and demonstrate its effectiveness in a pilot study via a short video.
Conversational Document Prediction to Assist Customer Care Agents
Jatin Ganhotra | Haggai Roitman | Doron Cohen | Nathaniel Mills | Chulaka Gunasekara | Yosi Mass | Sachindra Joshi | Luis Lastras | David Konopnicki
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Jatin Ganhotra | Haggai Roitman | Doron Cohen | Nathaniel Mills | Chulaka Gunasekara | Yosi Mass | Sachindra Joshi | Luis Lastras | David Konopnicki
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
A frequent pattern in customer care conversations is the agents responding with appropriate webpage URLs that address users’ needs. We study the task of predicting the documents that customer care agents can use to facilitate users’ needs. We also introduce a new public dataset which supports the aforementioned problem. Using this dataset and two others, we investigate state-of-the art deep learning (DL) and information retrieval (IR) models for the task. Additionally, we analyze the practicality of such systems in terms of inference time complexity. Our show that an hybrid IR+DL approach provides the best of both worlds.
Unsupervised FAQ Retrieval with Question Generation and BERT
Yosi Mass | Boaz Carmeli | Haggai Roitman | David Konopnicki
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
Yosi Mass | Boaz Carmeli | Haggai Roitman | David Konopnicki
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
We focus on the task of Frequently Asked Questions (FAQ) retrieval. A given user query can be matched against the questions and/or the answers in the FAQ. We present a fully unsupervised method that exploits the FAQ pairs to train two BERT models. The two models match user queries to FAQ answers and questions, respectively. We alleviate the missing labeled data of the latter by automatically generating high-quality question paraphrases. We show that our model is on par and even outperforms supervised models on existing datasets.
Ad-hoc Document Retrieval using Weak-Supervision with BERT and GPT2
Yosi Mass | Haggai Roitman
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
Yosi Mass | Haggai Roitman
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
We describe a weakly-supervised method for training deep learning models for the task of ad-hoc document retrieval. Our method is based on generative and discriminative models that are trained using weak-supervision just from the documents in the corpus. We present an end-to-end retrieval system that starts with traditional information retrieval methods, followed by two deep learning re-rankers. We evaluate our method on three different datasets: a COVID-19 related scientific literature dataset and two news datasets. We show that our method outperforms state-of-the-art methods; this without the need for the expensive process of manually labeling data.
2019
A Summarization System for Scientific Documents
Shai Erera | Michal Shmueli-Scheuer | Guy Feigenblat | Ora Peled Nakash | Odellia Boni | Haggai Roitman | Doron Cohen | Bar Weiner | Yosi Mass | Or Rivlin | Guy Lev | Achiya Jerbi | Jonathan Herzig | Yufang Hou | Charles Jochim | Martin Gleize | Francesca Bonin | Debasis Ganguly | David Konopnicki
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations
Shai Erera | Michal Shmueli-Scheuer | Guy Feigenblat | Ora Peled Nakash | Odellia Boni | Haggai Roitman | Doron Cohen | Bar Weiner | Yosi Mass | Or Rivlin | Guy Lev | Achiya Jerbi | Jonathan Herzig | Yufang Hou | Charles Jochim | Martin Gleize | Francesca Bonin | Debasis Ganguly | David Konopnicki
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations
We present a novel system providing summaries for Computer Science publications. Through a qualitative user study, we identified the most valuable scenarios for discovery, exploration and understanding of scientific documents. Based on these findings, we built a system that retrieves and summarizes scientific documents for a given information need, either in form of a free-text query or by choosing categorized values such as scientific tasks, datasets and more. Our system ingested 270,000 papers, and its summarization module aims to generate concise yet detailed summaries. We validated our approach with human experts.
2018
Semantic Relatedness of Wikipedia Concepts – Benchmark Data and a Working Solution
Liat Ein Dor | Alon Halfon | Yoav Kantor | Ran Levy | Yosi Mass | Ruty Rinott | Eyal Shnarch | Noam Slonim
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Liat Ein Dor | Alon Halfon | Yoav Kantor | Ran Levy | Yosi Mass | Ruty Rinott | Eyal Shnarch | Noam Slonim
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Learning Thematic Similarity Metric from Article Sections Using Triplet Networks
Liat Ein Dor | Yosi Mass | Alon Halfon | Elad Venezian | Ilya Shnayderman | Ranit Aharonov | Noam Slonim
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Liat Ein Dor | Yosi Mass | Alon Halfon | Elad Venezian | Ilya Shnayderman | Ranit Aharonov | Noam Slonim
Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
In this paper we suggest to leverage the partition of articles into sections, in order to learn thematic similarity metric between sentences. We assume that a sentence is thematically closer to sentences within its section than to sentences from other sections. Based on this assumption, we use Wikipedia articles to automatically create a large dataset of weakly labeled sentence triplets, composed of a pivot sentence, one sentence from the same section and one from another section. We train a triplet network to embed sentences from the same section closer. To test the performance of the learned embeddings, we create and release a sentence clustering benchmark. We show that the triplet network learns useful thematic metrics, that significantly outperform state-of-the-art semantic similarity methods and multipurpose embeddings on the task of thematic clustering of sentences. We also show that the learned embeddings perform well on the task of sentence semantic similarity prediction.
Search
Fix author
Co-authors
- David Konopnicki 5
- Haggai Roitman 5
- Doron Cohen 4
- Boaz Carmeli 3
- Nathaniel Mills 3
- Asaf Yehudai 3
- Liat Ein Dor 2
- Shai Erera 2
- Jatin Ganhotra 2
- Chulaka Gunasekara 2
- Alon Halfon 2
- Sachindra Joshi 2
- Luis Lastras 2
- Noam Slonim 2
- Ranit Aharonov 1
- Yonatan Belinkov 1
- Odellia Boni 1
- Francesca Bonin 1
- Leshem Choshen 1
- Danish Contractor 1
- Liat Ein-Dor 1
- Kshitij Fadnis 1
- Guy Feigenblat 1
- Debasis Ganguly 1
- Zorik Gekhman 1
- Martin Gleize 1
- Jonathan Herzig 1
- Yufang Hou 1
- Achiya Jerbi 1
- Charles Jochim 1
- Yoav Kantor 1
- Guy Lev 1
- Ran Levy 1
- Q. Vera Liao 1
- Gaurav Pandey 1
- Siva Patel 1
- Yehonatan Peisakhovsky 1
- Ora Peled Nakash 1
- Adir Rahamim 1
- Roi Reichart 1
- Ruty Rinott 1
- Or Rivlin 1
- Michal Shmueli-Scheuer 1
- Eyal Shnarch 1
- Ilya Shnayderman 1
- Assaf Toledo 1
- Elad Venezian 1
- Bar Weiner 1