Thijmen Bijl

2026

Enhancing Job Evaluation with Data Augmentation and Text Classification
Samaneh Jalilian | Niels van Weeren | Mohammad Shokri | Thijmen Bijl | Suzan Verberne
Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)

Accurate job grading and evaluation are essential for ensuring fair compensation in Human Resources (HR) planning. In this research, we propose to improve job evaluation by semi-automating a manual, time-consuming, and inconsistent process with text-based classification models. We address three prediction tasks: job title classification, grading, and compensation prediction. For job title classification, we fine-tune a RoBERTa model for classification and use Gemini to generate synthetic job descriptions for rare job titles. For grade and compensation prediction, we compare TF-IDF and transformer-based embeddings (DistilRoBERTa, MPNet, MiniLM) in combination with deep neural networks and tree-based models (Random Forest, XGBoost). We optimize all models using grid search with hyperparameter tuning and cross-validation. The results show that job title classification by RoBERTa with Gemini-generated descriptions works well with an accuracy of about 97%. In our regression experiments, our models get promising results: for grade prediction, a tuned TF-IDF + XGBoost model achieves a mean absolute error (MAE) of 0.185, and for annual salary prediction, MiniLM embeddings with XGBoost get an MAE of €1,587. These findings demonstrate that a semi-automated pipeline can enhance traditional manual processes by boosting consistency, speeding up HR workflows, and reducing biased assessments.

Co-authors

Venues

ACL1

Fix author