Ann-Sophie Gnehm

Also published as: Ann-sophie Gnehm


pdf bib
Fine-Grained Extraction and Classification of Skill Requirements in German-Speaking Job Ads
Ann-sophie Gnehm | Eva Bühlmann | Helen Buchs | Simon Clematide
Proceedings of the Fifth Workshop on Natural Language Processing and Computational Social Science (NLP+CSS)

Monitoring the development of labor market skill requirements is an information need that is more and more approached by applying text mining methods to job advertisement data. We present an approach for fine-grained extraction and classification of skill requirements from German-speaking job advertisements. We adapt pre-trained transformer-based language models to the domain and task of computing meaningful representations of sentences or spans. By using context from job advertisements and the large ESCO domain ontology we improve our similarity-based unsupervised multi-label classification results. Our best model achieves a mean average precision of 0.969 on the skill class level.

Evaluation of Transfer Learning and Domain Adaptation for Analyzing German-Speaking Job Advertisements
Ann-Sophie Gnehm | Eva Bühlmann | Simon Clematide
Proceedings of the Thirteenth Language Resources and Evaluation Conference

This paper presents text mining approaches on German-speaking job advertisements to enable social science research on the development of the labour market over the last 30 years. In order to build text mining applications providing information about profession and main task of a job, as well as experience and ICT skills needed, we experiment with transfer learning and domain adaptation. Our main contribution consists in building language models which are adapted to the domain of job advertisements, and their assessment on a broad range of machine learning problems. Our findings show the large value of domain adaptation in several respects. First, it boosts the performance of fine-tuned task-specific models consistently over all evaluation experiments. Second, it helps to mitigate rapid data shift over time in our special domain, and enhances the ability to learn from small updates with new, labeled task data. Third, domain-adaptation of language models is efficient: With continued in-domain pre-training we are able to outperform general-domain language models pre-trained on ten times more data. We share our domain-adapted language models and data with the research community.


Text Zoning and Classification for Job Advertisements in German, French and English
Ann-Sophie Gnehm | Simon Clematide
Proceedings of the Fourth Workshop on Natural Language Processing and Computational Social Science

We present experiments to structure job ads into text zones and classify them into pro- fessions, industries and management functions, thereby facilitating social science analyses on labor marked demand. Our main contribution are empirical findings on the benefits of contextualized embeddings and the potential of multi-task models for this purpose. With contextualized in-domain embeddings in BiLSTM-CRF models, we reach an accuracy of 91% for token-level text zoning and outperform previous approaches. A multi-tasking BERT model performs well for our classification tasks. We further compare transfer approaches for our multilingual data.