KARRIEREWEGE: A large scale Career Path Prediction Dataset
Elena Senger, Yuri Campbell, Rob van der Goot, Barbara Plank
Abstract
Accurate career path prediction can support many stakeholders, like job seekers, recruiters, HR, and project managers. However, publicly available data and tools for career path prediction are scarce. In this work, we introduce Karrierewege, a comprehensive, publicly available dataset containing over 500k career paths, significantly surpassing the size of previously available datasets. We link the dataset to the ESCO taxonomy to offer a valuable resource for predicting career trajectories. To tackle the problem of free-text inputs typically found in resumes, we enhance it by synthesizing job titles and descriptions resulting in Karrierewege+. This allows for accurate predictions from unstructured data, closely aligning with practical application challenges. We benchmark existing state-of-the-art (SOTA) models on our dataset and a previous benchmark and see increased performance and robustness by synthesizing the data for the free-text use cases.- Anthology ID:
- 2025.coling-industry.46
- Volume:
- Proceedings of the 31st International Conference on Computational Linguistics: Industry Track
- Month:
- January
- Year:
- 2025
- Address:
- Abu Dhabi, UAE
- Editors:
- Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, Steven Schockaert, Kareem Darwish, Apoorv Agarwal
- Venue:
- COLING
- SIG:
- Publisher:
- Association for Computational Linguistics
- Note:
- Pages:
- 533–545
- Language:
- URL:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2025.coling-industry.46/
- DOI:
- Cite (ACL):
- Elena Senger, Yuri Campbell, Rob van der Goot, and Barbara Plank. 2025. KARRIEREWEGE: A large scale Career Path Prediction Dataset. In Proceedings of the 31st International Conference on Computational Linguistics: Industry Track, pages 533–545, Abu Dhabi, UAE. Association for Computational Linguistics.
- Cite (Informal):
- KARRIEREWEGE: A large scale Career Path Prediction Dataset (Senger et al., COLING 2025)
- PDF:
- https://preview.aclanthology.org/jlcl-multiple-ingestion/2025.coling-industry.46.pdf