2025
pdf
bib
abs
NER Retriever: Zero-Shot Named Entity Retrieval with Type-Aware Embeddings
Or Shachar
|
Uri Katz
|
Yoav Goldberg
|
Oren Glickman
Findings of the Association for Computational Linguistics: EMNLP 2025
We present NER Retriever, a zero-shot retrieval framework for ad-hoc Named Entity Recognition (NER), where a user-defined type description is used to retrieve documents mentioning entities of that type. Instead of relying on fixed schemas or fine-tuned models, our method builds on pretrained language models (LLMs) to embed both entity mentions and type descriptions into a shared semantic space. We show that internal representations—specifically, the value vectors from mid-layer transformer blocks—encode fine-grained type information more effectively than commonly used top-layer embeddings. To refine these representations, we train a lightweight contrastive projection network that aligns type-compatible entities while separating unrelated types. The resulting entity embeddings are compact, type-aware, and well-suited for nearest-neighbor search. Evaluated on three benchmarks, NER Retriever significantly outperforms both lexical (BM25) and dense (sentence-level) retrieval baselines, particularly in low-context settings. Our findings provide empirical support for representation selection within LLMs and demonstrate a practical solution for scalable, schema-free entity retrieval.
pdf
bib
abs
Generating Tables from the Parametric Knowledge of Language Models
Yevgeni Berkovitch
|
Oren Glickman
|
Amit Somech
|
Tomer Wolfson
Proceedings of the 4th International Workshop on Knowledge-Augmented Methods for Natural Language Processing
We explore generating factual tables from the parametric knowledge of large language models (LLMs). While LLMs have demonstrated impressive capabilities in recreating knowledge bases and generating free-form text, their ability to generate structured tabular data has received little attention. To address this gap, we explore the table generation abilities of eight state-of-the-art LLMs, including GPT-4o and Llama3.1-405B, using three prompting methods: full-table, row-by-row, and cell-by-cell. To facilitate evaluation we introduce WikiTabGen, a new benchmark consisting of 119 manually curated Wikipedia tables and their description. Our findings show that table generation remains challenging, with the best performing model (LLaMA3.1-405B) reaching only 25.4% accuracy. We further analyze how properties like table size, popularity, and numerical content impact performance. This study highlights the unique challenges of LLM-based table generation and offers a foundation for future research in this area. All code, data, and prompts are publicly available.
2006
pdf
bib
Direct Word Sense Matching for Lexical Substitution
Ido Dagan
|
Oren Glickman
|
Alfio Gliozzo
|
Efrat Marmorshtein
|
Carlo Strapparava
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics
pdf
bib
Lexical Reference: a Semantic Matching Subtask
Oren Glickman
|
Eyal Shnarch
|
Ido Dagan
Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing
pdf
bib
Investigating Lexical Substitution Scoring for Subtitle Generation
Oren Glickman
|
Ido Dagan
|
Walter Daelemans
|
Mikaela Keller
|
Samy Bengio
Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X)
2005
pdf
bib
A Probabilistic Setting and Lexical Coocurrence Model for Textual Entailment
Oren Glickman
|
Ido Dagan
Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment
pdf
bib
Definition and Analysis of Intermediate Entailment Levels
Roy Bar-Haim
|
Idan Szpektor
|
Oren Glickman
Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment
1995
pdf
bib
Using Context in Machine Translation of Spoken Language
Lori Levin
|
Oren Glickman
|
Yan Qu
|
Carolyn P. Rose
|
Donna Gates
|
Alon Lavie
|
Alex Waibel
|
Carol Van Ess-Dykema
Proceedings of the Sixth Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages