Chris Brown


2025

pdf bib
Human and LLM-Based Resume Matching: An Observational Study
Swanand Vaishampayan | Hunter Leary | Yoseph Berhanu Alebachew | Louis Hickman | Brent A. Stevenor | Weston Beck | Chris Brown
Findings of the Association for Computational Linguistics: NAACL 2025

Resume matching assesses the extent to which candidates qualify for jobs based on the content of resumes. This process increasingly uses natural language processing (NLP) techniques to automate parsing and rating tasks—saving time and effort. Large language models (LLMs) are increasingly used for this purpose—thus, we explore their capabilities for resume matching in an observational study. We compare zero-shot GPT-4 and human ratings for 736 resumes submitted to job openings from diverse fields using real-world evaluation criteria. We also study the effects of prompt engineering techniques on GPT-4 ratings and compare differences in GPT-4 and human ratings across racial and gender groups. Our results show: LLM scores correlate minorly with humans, suggesting they are not interchangeable; prompt engineering such as CoT improves the quality of LLM ratings; and LLM scores do not show larger group differences (i.e., bias) than humans. Our findings provide implications for LLM-based resume rating to promote more fair and NLP-based resume matching in a multicultural world.