Swiss-AL: A Multilingual Swiss Web Corpus for Applied Linguistics
Julia Krasselt, Philipp Dressen, Matthias Fluor, Cerstin Mahlow, Klaus Rothenhäusler, Maren Runte
Abstract
The Swiss Web Corpus for Applied Linguistics (Swiss-AL) is a multilingual (German, French, Italian) collection of texts from selected web sources. Unlike most other web corpora it is not intended for NLP purposes, but rather designed to support data-based and data-driven research on societal and political discourses in Switzerland. It currently contains 8 million texts (approx. 1.55 billion tokens), including news and specialist publications, governmental opinions, and parliamentary records, web sites of political parties, companies, and universities, statements from industry associations and NGOs, etc. A flexible processing pipeline using state-of-the-art components allows researchers in applied linguistics to create tailor-made subcorpora for studying discourse in a wide range of domains. So far, Swiss-AL has been used successfully in research on Swiss public discourses on energy and on antibiotic resistance.- Anthology ID:
- 2020.lrec-1.510
- Volume:
- Proceedings of the Twelfth Language Resources and Evaluation Conference
- Month:
- May
- Year:
- 2020
- Address:
- Marseille, France
- Editors:
- Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry Declerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis
- Venue:
- LREC
- SIG:
- Publisher:
- European Language Resources Association
- Note:
- Pages:
- 4145–4151
- Language:
- English
- URL:
- https://aclanthology.org/2020.lrec-1.510
- DOI:
- Cite (ACL):
- Julia Krasselt, Philipp Dressen, Matthias Fluor, Cerstin Mahlow, Klaus Rothenhäusler, and Maren Runte. 2020. Swiss-AL: A Multilingual Swiss Web Corpus for Applied Linguistics. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 4145–4151, Marseille, France. European Language Resources Association.
- Cite (Informal):
- Swiss-AL: A Multilingual Swiss Web Corpus for Applied Linguistics (Krasselt et al., LREC 2020)
- PDF:
- https://preview.aclanthology.org/emnlp22-frontmatter/2020.lrec-1.510.pdf