Kerria Pang-Naylor


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2025

pdf bib
Controllable Clustering with LLM-driven Embeddings
Kerria Pang-Naylor | Shivani Manivasagan | Aitong Zhong | Mehak Garg | Nicholas Mondello | Blake Buckner | Jonathan P. Chang | Khyati Mahajan | Masoud Hashemi | Fabio Casati
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track

Given the inherent subjectivity of similarity in text, fully unsupervised text clustering is unlikely to produce groupings that work across a variety of use cases. Traditional techniques to guide clustering rely on costly, time-consuming human feedback and/or pre-existing labels. Leveraging recent advancements in LLMs and decoder-only embedding models, we present techniques to effectively control text embeddings with minimal human input: prefix instructions and LLM preprocessing. We evaluate clustering performance for datasets with multiple independent ground-truth labels, or perspectives, and find that these techniques can be used to improve clustering for one perspective or use case, at the cost of a tradeoff in performance for another use case.