Klara Venglarova


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2024

pdf bib
Extracting position titles from unstructured historical job advertisements
Klara Venglarova | Raven Adam | Georg Vogeler
Proceedings of the 4th International Conference on Natural Language Processing for Digital Humanities

This paper explores the automated extraction of job titles from unstructured historical job advertisements, using a corpus of digitized German-language newspapers from 1850-1950. The study addresses the challenges of working with unstructured, OCR-processed historical data, contrasting with contemporary approaches that often use structured, digitally-born datasets when dealing with this text type. We compare four extraction methods: a dictionary-based approach, a rule-based approach, a named entity recognition (NER) mode, and a text-generation method. The NER approach, trained on manually annotated data, achieved the highest F1 score (0.944 using transformers model trained on GPU, 0.884 model trained on CPU), demonstrating its flexibility and ability to correctly identify job titles. The text-generation approach performs similarly (0.920). However, the rule-based (0.69) and dictionary-based (0.632) methods reach relatively high F1 Scores as well, while offering the advantage of not requiring extensive labeling of training data. The results highlight the complexities of extracting meaningful job titles from historical texts, with implications for further research into labor market trends and occupational history.