2022
pdf
abs
EPIC UdS - Creation and Applications of a Simultaneous Interpreting Corpus
Heike Przybyl
|
Ekaterina Lapshinova-Koltunski
|
Katrin Menzel
|
Stefan Fischer
|
Elke Teich
Proceedings of the Thirteenth Language Resources and Evaluation Conference
In this paper, we describe the creation and annotation of EPIC UdS, a multilingual corpus of simultaneous interpreting for English, German and Spanish. We give an overview of the comparable and parallel, aligned corpus variants and explore various applications of the corpus. What makes EPIC UdS relevant is that it is one of the rare interpreting corpora that includes transcripts suitable for research on more than one language pair and on interpreting with regard to German. It not only contains transcribed speeches, but also rich metadata and fine-grained linguistic annotations tailored for diverse applications across a broad range of linguistic subfields.
2020
pdf
abs
The Royal Society Corpus 6.0: Providing 300+ Years of Scientific Writing for Humanistic Study
Stefan Fischer
|
Jörg Knappen
|
Katrin Menzel
|
Elke Teich
Proceedings of the Twelfth Language Resources and Evaluation Conference
We present a new, extended version of the Royal Society Corpus (RSC), a diachronic corpus of scientific English now covering 300+ years of scientific writing (1665–1996). The corpus comprises 47 837 texts, primarily scientific articles, and is based on publications of the Royal Society of London, mainly its Philosophical Transactions and Proceedings. The corpus has been built on the basis of the FAIR principles and is freely available under a Creative Commons license, excluding copy-righted parts. We provide information on how the corpus can be found, the file formats available for download as well as accessibility via a web-based corpus query platform. We show a number of analytic tools that we have implemented for better usability and provide an example of use of the corpus for linguistic analysis as well as examples of subsequent, external uses of earlier releases. We place the RSC against the background of existing English diachronic/scientific corpora, elaborating on its value for linguistic and humanistic study.
2017
pdf
The Making of the Royal Society Corpus
Jörg Knappen
|
Stefan Fischer
|
Hannah Kermes
|
Elke Teich
|
Peter Fankhauser
Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language
2016
pdf
abs
Compasses, Magnets, Water Microscopes: Annotation of Terminology in a Diachronic Corpus of Scientific Texts
Anne-Kathrin Schumann
|
Stefan Fischer
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
The specialised lexicon belongs to the most prominent attributes of specialised writing: Terms function as semantically dense encodings of specialised concepts, which, in the absence of terms, would require lengthy explanations and descriptions. In this paper, we argue that terms are the result of diachronic processes on both the semantic and the morpho-syntactic level. Very little is known about these processes. We therefore present a corpus annotation project aiming at revealing how terms are coined and how they evolve to fit their function as semantically and morpho-syntactically dense encodings of specialised knowledge. The scope of this paper is two-fold: Firstly, we outline our methodology for annotating terminology in a diachronic corpus of scientific publications. Moreover, we provide a detailed analysis of our annotation results and suggest methods for improving the accuracy of annotations in a setting as difficult as ours. Secondly, we present results of a pilot study based on the annotated terms. The results suggest that terms in older texts are linguistically relatively simple units that are hard to distinguish from the lexicon of general language. We believe that this supports our hypothesis that terminology undergoes diachronic processes of densification and specialisation.
2015
pdf
Vector-space calculation of semantic surprisal for predicting word pronunciation duration
Asad Sayeed
|
Stefan Fischer
|
Vera Demberg
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)