2024
pdf
abs
Efficient Performance Tracking: Leveraging Large Language Models for Automated Construction of Scientific Leaderboards
Furkan Şahinuç
|
Thy Thy Tran
|
Yulia Grishina
|
Yufang Hou
|
Bei Chen
|
Iryna Gurevych
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing
Scientific leaderboards are standardized ranking systems that facilitate evaluating and comparing competitive methods. Typically, a leaderboard is defined by a task, dataset, and evaluation metric (TDM) triple, allowing objective performance assessment and fostering innovation through benchmarking. However, the exponential increase in publications has made it infeasible to construct and maintain these leaderboards manually. Automatic leaderboard construction has emerged as a solution to reduce manual labor. Existing datasets for this task are based on the community-contributed leaderboards without additional curation. Our analysis shows that a large portion of these leaderboards are incomplete, and some of them contain incorrect information. In this work, we present SciLead, a manually-curated Scientific Leaderboard dataset that overcomes the aforementioned problems. Building on this dataset, we propose three experimental settings that simulate real-world scenarios where TDM triples are fully defined, partially defined, or undefined during leaderboard construction. While previous research has only explored the first setting, the latter two are more representative of real-world applications. To address these diverse settings, we develop a comprehensive LLM-based framework for constructing leaderboards. Our experiments and analysis reveal that various LLMs often correctly identify TDM triples while struggling to extract result values from publications. We make our code and data publicly available.
2022
pdf
abs
Local-to-global learning for iterative training of production SLU models on new features
Yulia Grishina
|
Daniil Sorokin
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track
In production SLU systems, new training data becomes available with time so that ML models need to be updated on a regular basis. Specifically, releasing new features adds new classes of data while the old data remains constant. However, retraining the full model each time from scratch is computationally expensive. To address this problem, we propose to consider production releases from the curriculum learning perspective and to adapt the local-to-global learning (LGL) schedule (Cheng et. al, 2019) for a statistical model that starts with fewer output classes and adds more classes with each iteration. We report experiments for the tasks of intent classification and slot filling in the context of a production voice-assistant. First, we apply the original LGL schedule on our data and then adapt LGL to the production setting where the full data is not available at initial training iterations. We demonstrate that our method improves model error rates by 7.3% and saves up to 25% training time for individual iterations.
2021
pdf
bib
Proceedings of the Fourth Workshop on Computational Models of Reference, Anaphora and Coreference
Maciej Ogrodniczuk
|
Sameer Pradhan
|
Massimo Poesio
|
Yulia Grishina
|
Vincent Ng
Proceedings of the Fourth Workshop on Computational Models of Reference, Anaphora and Coreference
2020
pdf
bib
Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference
Maciej Ogrodniczuk
|
Vincent Ng
|
Yulia Grishina
|
Sameer Pradhan
Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference
pdf
abs
Truecasing German user-generated conversational text
Yulia Grishina
|
Thomas Gueudre
|
Ralf Winkler
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)
True-casing, the task of restoring proper case to (generally) lower case input, is important in downstream tasks and for screen display. In this paper, we investigate truecasing as an in- trinsic task and present several experiments on noisy user queries to a voice-controlled dia- log system. In particular, we compare a rule- based, an n-gram language model (LM) and a recurrent neural network (RNN) approaches, evaluating the results on a German Q&A cor- pus and reporting accuracy for different case categories. We show that while RNNs reach higher accuracy especially on large datasets, character n-gram models with interpolation are still competitive, in particular on mixed- case words where their fall-back mechanisms come into play.
2019
pdf
bib
Proceedings of the Second Workshop on Computational Models of Reference, Anaphora and Coreference
Maciej Ogrodniczuk
|
Sameer Pradhan
|
Yulia Grishina
|
Vincent Ng
Proceedings of the Second Workshop on Computational Models of Reference, Anaphora and Coreference
2018
pdf
bib
abs
Anaphora Resolution with the ARRAU Corpus
Massimo Poesio
|
Yulia Grishina
|
Varada Kolhatkar
|
Nafise Moosavi
|
Ina Roesiger
|
Adam Roussel
|
Fabian Simonjetz
|
Alexandra Uma
|
Olga Uryupina
|
Juntao Yu
|
Heike Zinsmeister
Proceedings of the First Workshop on Computational Models of Reference, Anaphora and Coreference
The ARRAU corpus is an anaphorically annotated corpus of English providing rich linguistic information about anaphora resolution. The most distinctive feature of the corpus is the annotation of a wide range of anaphoric relations, including bridging references and discourse deixis in addition to identity (coreference). Other distinctive features include treating all NPs as markables, including non-referring NPs; and the annotation of a variety of morphosyntactic and semantic mention and entity attributes, including the genericity status of the entities referred to by markables. The corpus however has not been extensively used for anaphora resolution research so far. In this paper, we discuss three datasets extracted from the ARRAU corpus to support the three subtasks of the CRAC 2018 Shared Task–identity anaphora resolution over ARRAU-style markables, bridging references resolution, and discourse deixis; the evaluation scripts assessing system performance on those datasets; and preliminary results on these three tasks that may serve as baseline for subsequent research in these phenomena.
2017
pdf
abs
Multi-source annotation projection of coreference chains: assessing strategies and testing opportunities
Yulia Grishina
|
Manfred Stede
Proceedings of the 2nd Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2017)
In this paper, we examine the possibility of using annotation projection from multiple sources for automatically obtaining coreference annotations in the target language. We implement a multi-source annotation projection algorithm and apply it on an English-German-Russian parallel corpus in order to transfer coreference chains from two sources to the target side. Operating in two settings – a low-resource and a more linguistically-informed one – we show that automatic coreference transfer could benefit from combining information from multiple languages, and assess the quality of both the extraction and the linking of target coreference mentions.
pdf
abs
CORBON 2017 Shared Task: Projection-Based Coreference Resolution
Yulia Grishina
Proceedings of the 2nd Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2017)
The CORBON 2017 Shared Task, organised as part of the Coreference Resolution Beyond OntoNotes workshop at EACL 2017, presented a new challenge for multilingual coreference resolution: we offer a projection-based setting in which one is supposed to build a coreference resolver for a new language exploiting little or even no knowledge of it, with our languages of interest being German and Russian. We additionally offer a more traditional setting, targeting the development of a multilingual coreference resolver without any restrictions on the resources and methods used. In this paper, we describe the task setting and provide the results of one participant who successfully completed the task, comparing their results to the closely related previous research. Analysing the task setting and the results, we discuss the major challenges and make suggestions on the future directions of coreference evaluation.
pdf
abs
Combining the output of two coreference resolution systems for two source languages to improve annotation projection
Yulia Grishina
Proceedings of the Third Workshop on Discourse in Machine Translation
Although parallel coreference corpora can to a high degree support the development of SMT systems, there are no large-scale parallel datasets available due to the complexity of the annotation task and the variability in annotation schemes. In this study, we exploit an annotation projection method to combine the output of two coreference resolution systems for two different source languages (English, German) in order to create an annotated corpus for a third language (Russian). We show that our technique is superior to projecting annotations from a single source language, and we provide an in-depth analysis of the projected annotations in order to assess the perspectives of our approach.
2016
pdf
bib
Experiments on bridging across languages and genres
Yulia Grishina
Proceedings of the Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2016)
pdf
Anaphoricity in Connectives: A Case Study on German
Manfred Stede
|
Yulia Grishina
Proceedings of the Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2016)
2015
pdf
Knowledge-lean projection of coreference chains across languages
Yulia Grishina
|
Manfred Stede
Proceedings of the Eighth Workshop on Building and Using Comparable Corpora
2014
pdf
Conceptual and Practical Steps in Event Coreference Analysis of Large-scale Data
Fatemeh Torabi Asr
|
Jonathan Sonntag
|
Yulia Grishina
|
Manfred Stede
Proceedings of the Second Workshop on EVENTS: Definition, Detection, Coreference, and Representation