Tom S Juzek


2020

pdf
How Human is Machine Translationese? Comparing Human and Machine Translations of Text and Speech
Yuri Bizzoni | Tom S Juzek | Cristina España-Bonet | Koel Dutta Chowdhury | Josef van Genabith | Elke Teich
Proceedings of the 17th International Conference on Spoken Language Translation

Translationese is a phenomenon present in human translations, simultaneous interpreting, and even machine translations. Some translationese features tend to appear in simultaneous interpreting with higher frequency than in human text translation, but the reasons for this are unclear. This study analyzes translationese patterns in translation, interpreting, and machine translation outputs in order to explore possible reasons. In our analysis we – (i) detail two non-invasive ways of detecting translationese and (ii) compare translationese across human and machine translations from text and speech. We find that machine translation shows traces of translationese, but does not reproduce the patterns found in human translation, offering support to the hypothesis that such patterns are due to the model (human vs machine) rather than to the data (written vs spoken).

pdf
Exploring diachronic syntactic shifts with dependency length: the case of scientific English
Tom S Juzek | Marie-Pauline Krielke | Elke Teich
Proceedings of the Fourth Workshop on Universal Dependencies (UDW 2020)

We report on an application of universal dependencies for the study of diachronic shifts in syntactic usage patterns. Our focus is on the evolution of Scientific English in the Late Modern English period (ca. 1700-1900). Our data set is the Royal Society Corpus (RSC), comprising the full set of publications of the Royal Society of London between 1665 and 1996. Our starting assumption is that over time, Scientific English develops specific syntactic choice preferences that increase efficiency in (expert-to-expert) communication. The specific hypothesis we pursue in this paper is that changing syntactic choice preferences lead to greater dependency locality/dependency length minimization, which is associated with positive effects for the efficiency of human as well as computational linguistic processing. As a basis for our measurements, we parsed the RSC using Stanford CoreNLP. Overall, we observe a decrease in dependency length, with long dependency structures becoming less frequent and short dependency structures becoming more frequent over time, notably pertaining to the nominal phrase, thus marking an overall push towards greater communicative efficiency.

pdf
Schwa-deletion in German noun-noun compounds
Tom S Juzek | Jana Haeussler
Proceedings of the Workshop on the Cognitive Aspects of the Lexicon

We report ongoing research on linking elements in German compounds, with a focus on noun-noun compounds in which the first constituent is ending in schwa. We present a corpus of about 3000 nouns ending in schwa, annotated for various phonological and morpho-syntactic features, and critically, the dominant linking strategy. The corpus analysis is complemented by an unsuccessful attempt to train neural networks and by a pilot experiment asking native speakers to indicate their preferred linking strategy. In addition to existing nouns, the experimental stimuli included nonce words, also ending in schwa. While neither the corpus study nor the experiment offer a clear picture, the results nevertheless provide interesting insights into the intricacies of German compounding. Overall, we find a predominance of the paradigmatic linking element -n for feminine and masculine nouns. At the same time, the results for nonce words show that -n is not a default strategy.