This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
VeraAleksić
Also published as:
Vera Aleksic
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
This paper presents a sentence generation pipeline as implemented on the online job board Stepstone. The goal is to automatically create a set of sentences for the candidate profile and the task description sections in a job ad, related to a given input skill. They must cover two different “tone of voice” variants in German (Du, Sie), three experience levels (junior, mid, senior), and two optionality values (skill is mandatory or optional/nice to have). The generation process considers the difference between soft skills, natural language competencies and hard skills, as well as more specific sub-categories such as IT skills, programming languages and similar. To create grammatically consistent text, morphosyntactic features from the proprietary skill ontology and lexicon are consulted. The approach is a lexicon-driven generation process that compares all lexical features of the new input skills with the ones already added to the sentence database and creates new sentences according to the corresponding templates.
The following paper presents a lexical analysis component as implemented in the PANACEA project. The goal is to automatically extract lexicon entries from crawled corpora, in an attempt to use corpus-based methods for high-quality linguistic text processing, and to focus on the quality of data without neglecting quantitative aspects. Lexical analysis has the task to assign linguistic information (like: part of speech, inflectional class, gender, subcategorisation frame, semantic properties etc.) to all parts of the input text. If tokens are ambiguous, lexical analysis must provide all possible sets of annotation for later (syntactic) disambiguation, be it tagging, or full parsing. The paper presents an approach for assigning part-of-speech tags for German and English to large input corpora (> 50 mio tokens), providing a workflow which takes as input crawled corpora and provides POS-tagged lemmata ready for lexicon integration. Tools include sentence splitting, lexicon lookup, decomposition, and POS defaulting. Evaluation shows that the overall error rate can be brought down to about 2% if language resources are properly designed. The complete workflow is implemented as a sequence of web services integrated into the PANACEA platform.