2023
pdf
abs
Cross-lingual Mediation: Readability Effects
Maria Kunilovskaya
|
Ruslan Mitkov
|
Eveline Wandl-Vogt
Proceedings of the Second Workshop on Text Simplification, Accessibility and Readability
This paper explores the readability of translated and interpreted texts compared to the original source texts and target language texts in the same domain. It was shown in the literature that translated and interpreted texts could exhibit lexical and syntactic properties that make them simpler, and hence, easier to process than their sources or comparable non-translations. In translation, this effect is attributed to the tendency to simplify and disambiguate the message. In interpreting, it can be enhanced by the temporal and cognitive constraints. We use readability annotations from the Newsela corpus to formulate a number of classification and regression tasks and fine-tune a multilingual pre-trained model on these tasks, obtaining models that can differentiate between complex and simple sentences. Then, the models are applied to predict the readability of sources, targets, and comparable target language originals in a zero-shot manner. Our test data – parallel and comparable – come from English-German bidirectional interpreting and translation subsets from the Europarl corpus. The results confirm the difference in readability between translated/interpreted targets against sentences in standard originally-authored source and target languages. Besides, we find consistent differences between the translation directions in the English-German language pair.
2020
pdf
abs
Identification of Indigenous Knowledge Concepts through Semantic Networks, Spelling Tools and Word Embeddings
Renato Rocha Souza
|
Amelie Dorn
|
Barbara Piringer
|
Eveline Wandl-Vogt
Proceedings of the Twelfth Language Resources and Evaluation Conference
In order to access indigenous, regional knowledge contained in language corpora, semantic tools and network methods are most typically employed. In this paper we present an approach for the identification of dialectal variations of words, or words that do not pertain to High German, on the example of non-standard language legacy collection questionnaires of the Bavarian Dialects in Austria (DBÖ). Based on selected cultural categories relevant to the wider project context, common words from each of these cultural categories and their lemmas using GermaLemma were identified. Through word embedding models the semantic vicinity of each word was explored, followed by the use of German Wordnet (Germanet) and the Hunspell tool. Whilst none of these tools have a comprehensive coverage of standard German words, they serve as an indication of dialects in specific semantic hierarchies. Methods and tools applied in this study may serve as an example for other similar projects dealing with non-standard or endangered language collections, aiming to access, analyze and ultimately preserve native regional language heritage.
2017
bib
Proceedings of the Workshop Knowledge Resources for the Socio-Economic Sciences and Humanities associated with RANLP 2017
Kalliopi Zervanou
|
Petya Osenova
|
Eveline Wandl-Vogt
|
Dan Cristea
Proceedings of the Workshop Knowledge Resources for the Socio-Economic Sciences and Humanities associated with RANLP 2017
2014
pdf
bib
How to semantically relate dialectal Dictionaries in the Linked Data Framework
Thierry Declerck
|
Eveline Wandl-Vogt
Proceedings of the 8th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH)
pdf
abs
A SKOS-based Schema for TEI encoded Dictionaries at ICLTT
Thierry Declerck
|
Karlheinz Mörth
|
Eveline Wandl-Vogt
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
At our institutes we are working with quite some dictionaries and lexical resources in the field of less-resourced language data, like dialects and historical languages. We are aiming at publishing those lexical data in the Linked Open Data framework in order to link them with available data sets for highly-resourced languages and elevating them thus to the same digital dignity the mainstream languages have gained. In this paper we concentrate on two TEI encoded variants of the Arabic language and propose a mapping of this TEI encoded data onto SKOS, showing how the lexical entries of the two dialectal dictionaries can be linked to other language resources available in the Linked Open Data cloud.