Jack Boylan


2025

pdf bib
GLiREL - Generalist Model for Zero-Shot Relation Extraction
Jack Boylan | Chris Hokamp | Demian Gholipour Ghalandari
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)

We introduce GLiREL, an efficient architecture and training paradigm for zero-shot relation classification. Identifying relationships between entities is a key task in information extraction pipelines. The zero-shot setting for relation extraction, where a taxonomy of relations is not pre-specified, has proven to be particularly challenging because of the computational complexity of inference, and because of the lack of labeled training data with sufficient coverage. Existing approaches rely upon distant supervision using auxiliary models to generate training data for unseen labels, upon very large general-purpose large language models (LLMs), or upon complex pipelines models with multiple inference stages. Inspired by the recent advancements in zero-shot named entity recognition, this paper introduces an approach to efficiently and accurately predict zero-shot relationship labels between multiple entities in a single forward pass. Experiments using the FewRel and WikiZSL benchmarks demonstrate that our approach achieves state-of-the-art results on the zero-shot relation classification task. In addition, we contribute a protocol for synthetically-generating datasets with diverse relation labels.

2024

pdf bib
STAGE: Simplified Text-Attributed Graph Embeddings using Pre-trained LLMs
Aaron Zolnai-Lucas | Jack Boylan | Chris Hokamp | Parsa Ghaffari
Proceedings of the 1st Workshop on Knowledge Graphs and Large Language Models (KaLLM 2024)

We present STAGE, a straightforward yet effective method for enhancing node features in Graph Neural Network (GNN) models that encode Text-Attributed Graphs (TAGs). Our approach leverages Large-Language Models (LLMs) to generate embeddings for textual attributes. STAGE achieves competitive results on various node classification benchmarks while also maintaining a simplicity in implementation relative to current state-of-the-art (SoTA) techniques. We show that utilizing pre-trained LLMs as embedding generators provides robust features for ensemble GNN training, enabling pipelines that are simpler than current SoTA approaches which require multiple expensive training and prompting stages. We also implement diffusion-pattern GNNs in an effort to make this pipeline scalable to graphs beyond academic benchmarks.