2020
pdf
bib
abs
CEASR: A Corpus for Evaluating Automatic Speech Recognition
Malgorzata Anna Ulasik
|
Manuela Hürlimann
|
Fabian Germann
|
Esin Gedik
|
Fernando Benites
|
Mark Cieliebak
Proceedings of the 12th Language Resources and Evaluation Conference
In this paper, we present CEASR, a Corpus for Evaluating the quality of Automatic Speech Recognition (ASR). It is a data set based on public speech corpora, containing metadata along with transcripts generated by several modern state-of-the-art ASR systems. CEASR provides this data in a unified structure, consistent across all corpora and systems, with normalised transcript texts and metadata. We use CEASR to evaluate the quality of ASR systems by calculating an average Word Error Rate (WER) per corpus, per system and per corpus-system pair. Our experiments show a substantial difference in accuracy between commercial versus open-source ASR tools as well as differences up to a factor ten for single systems on different corpora. Using CEASR allowed us to very efficiently and easily obtain these results. Our corpus enables researchers to perform ASR-related evaluations and various in-depth analyses with noticeably reduced effort, i.e. without the need to collect, process and transcribe the speech data themselves.
pdf
bib
abs
ZHAW-InIT - Social Media Geolocation at VarDial 2020
Fernando Benites
|
Manuela Hürlimann
|
Pius von Däniken
|
Mark Cieliebak
Proceedings of the 7th Workshop on NLP for Similar Languages, Varieties and Dialects
We describe our approaches for the Social Media Geolocation (SMG) task at the VarDial Evaluation Campaign 2020. The goal was to predict geographical location (latitudes and longitudes) given an input text. There were three subtasks corresponding to German-speaking Switzerland (CH), Germany and Austria (DE-AT), and Croatia, Bosnia and Herzegovina, Montenegro and Serbia (BCMS). We submitted solutions to all subtasks but focused our development efforts on the CH subtask, where we achieved third place out of 16 submissions with a median distance of 15.93 km and had the best result of 14 unconstrained systems. In the DE-AT subtask, we ranked sixth out of ten submissions (fourth of 8 unconstrained systems) and for BCMS we achieved fourth place out of 13 submissions (second of 11 unconstrained systems).
2017
pdf
bib
abs
SemEval-2017 Task 5: Fine-Grained Sentiment Analysis on Financial Microblogs and News
Keith Cortis
|
André Freitas
|
Tobias Daudert
|
Manuela Huerlimann
|
Manel Zarrouk
|
Siegfried Handschuh
|
Brian Davis
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)
This paper discusses the “Fine-Grained Sentiment Analysis on Financial Microblogs and News” task as part of SemEval-2017, specifically under the “Detecting sentiment, humour, and truth” theme. This task contains two tracks, where the first one concerns Microblog messages and the second one covers News Statements and Headlines. The main goal behind both tracks was to predict the sentiment score for each of the mentioned companies/stocks. The sentiment scores for each text instance adopted floating point values in the range of -1 (very negative/bearish) to 1 (very positive/bullish), with 0 designating neutral sentiment. This task attracted a total of 32 participants, with 25 participating in Track 1 and 29 in Track 2.
2016
pdf
bib
Combining Lexical and Spatial Knowledge to Predict Spatial Relations between Objects in Images
Manuela Hürlimann
|
Johan Bos
Proceedings of the 5th Workshop on Vision and Language
pdf
bib
abs
Semantic Relation Classification: Task Formalisation and Refinement
Vivian Santos
|
Manuela Huerliman
|
Brian Davis
|
Siegfried Handschuh
|
André Freitas
Proceedings of the 5th Workshop on Cognitive Aspects of the Lexicon (CogALex - V)
The identification of semantic relations between terms within texts is a fundamental task in Natural Language Processing which can support applications requiring a lightweight semantic interpretation model. Currently, semantic relation classification concentrates on relations which are evaluated over open-domain data. This work provides a critique on the set of abstract relations used for semantic relation classification with regard to their ability to express relationships between terms which are found in a domain-specific corpora. Based on this analysis, this work proposes an alternative semantic relation model based on reusing and extending the set of abstract relations present in the DOLCE ontology. The resulting set of relations is well grounded, allows to capture a wide range of relations and could thus be used as a foundation for automatic classification of semantic relations.