2022
pdf
abs
“The word expired when that world awoke.” New Challenges for Research with Large Text Corpora and Corpus-Based Discourse Studies in Totalitarian Times
Hanno Biber
Proceedings of the Workshop on Challenges in the Management of Large Corpora (CMLC-10)
In the following poster proposal a report will be given on the prospects of a promising corpus project initiated by one of the large digital text corpora hosted by the Austrian Academy of Sciences. First, the resources of the AAC-Austrian Academy Corpus, that has been founded in 2001, which is one of the very valuable examples of digital diachronic text corpora suitable for corpus-based discourse studies and lexicography based upon historical sources, can be used as a basis for trying to answer new questions concerning the challenges for doing linguistic research with large digital text corpora in the context of studying totalitarian language use. The questions, as well as the chances and limits of such an approach, have very obvious actual references to the historic events unfolding today as well as a clearly historical dimension, precisely because the digital text sources that have been created to analyse the German language use of the Nazi-period from 1933 to 1945 can be understood as a model to deal with related questions of contemporary language use, particularly in the context of the new war of extermination of Russia in Ukraine of the year 2022 and how it is represented in contemporary media.
2020
pdf
abs
Challenges for Making Use of a Large Text Corpus such as the ‘AAC – Austrian Academy Corpus’ for Digital Literary Studies
Hanno Biber
Proceedings of the 8th Workshop on Challenges in the Management of Large Corpora
The challenges for making use of a large text corpus such as the ‘AAC – Austrian Academy Corpus’ for the purposes of digital literary studies will be addressed in this presentation. The research question of how to use a digital text corpus of considerable size for such a specific research purpose is of interest for corpus research in general as it is of interest for digital literary text studies which rely to a large extent on large digital text corpora. The observations of the usage of lexical entities such as words, word forms, multi word units and many other linguistic units determine the way in which texts are being studied and explored. Larger entities have to be taken into account as well, which is why questions of semantic analysis and larger structures come into play. The texts of the AAC – Austrian Academy Corpus which was founded in 2001 are German language texts of historical and cultural significance from the time between 1848 and 1989. The aim of this study is to present possible research questions for corpus-based methodological approaches for the digital study of literary texts and to give examples of early experiments and experiences with making use of a large text corpus for these research purposes.
2012
pdf
abs
Fivehundredmillionandone Tokens. Loading the AAC Container with Text Resources for Text Studies.
Hanno Biber
|
Evelyn Breiteneder
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
The """"AAC - Austrian Academy Corpus"""" is a diachronic German language digital text corpus of more than 500 million tokens. The text corpus has collected several thousands of texts representing a wide range of different text types. The primary research aim is to develop text language resources for the study of texts. For corpus linguistics and corpus based language research large text corpora need to be structured in a systematic way. For this structural purpose the AAC is making use of the notion of container. By container in the context of corpus research we understand a flexible system of pragmatic representation, manipulation, modification and structured storage of annotated items of text. The issue of representing a large corpus in formats that offer only limited space is paradigmatic for the general task of representing a language by just a small collection of text or a small sample of the language. Methods based upon structural normalization and standardization have to be developed in order to provide useful instruments for text studies.
2008
pdf
abs
Words in Contexts: Digital Editions of Literary Journals in the “AAC - Austrian Academy Corpus”
Hanno Biber
|
Evelyn Breiteneder
|
Karlheinz Mörth
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
In this paper two highly innovative digital editions will be presented. For the creation and the implementation of these editions the latest developments within corpus research have been taken into account. The digital editions of the historical literary journals Die Fackel (published by Karl Kraus in Vienna from 1899 to 1936) and Der Brenner (published by Ludwig Ficker in Innsbruck from 1910 to 1954) have been developed within the corpus research framework of the AAC - Austrian Academy Corpus at the Austrian Academy of Sciences in collaboration with other researchers and programmers in the AAC from Vienna together with the graphic designer Anne Burdick from Los Angeles. For the creation of these scholarly digital editions the AAC edition philosophy and edition principles have been applied whereby new corpus research methods have been made use of for questions of computational philology and textual studies in a digital environment. The examples of the digital online editions of the literary journals Die Fackel and Der Brenner will give insights into the potentials and the benefits of making corpus research methods and techniques available for scholarly research into language and literature.
2004
pdf
The AAC [Austrian Academy Corpus] – An Enterprise to Develop Large Electronic Text Corpora
Hanno Biber
|
Evelyn Breiteneder
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)