Yulia Grishina


2022

pdf
Local-to-global learning for iterative training of production SLU models on new features
Yulia Grishina | Daniil Sorokin
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track

In production SLU systems, new training data becomes available with time so that ML models need to be updated on a regular basis. Specifically, releasing new features adds new classes of data while the old data remains constant. However, retraining the full model each time from scratch is computationally expensive. To address this problem, we propose to consider production releases from the curriculum learning perspective and to adapt the local-to-global learning (LGL) schedule (Cheng et. al, 2019) for a statistical model that starts with fewer output classes and adds more classes with each iteration. We report experiments for the tasks of intent classification and slot filling in the context of a production voice-assistant. First, we apply the original LGL schedule on our data and then adapt LGL to the production setting where the full data is not available at initial training iterations. We demonstrate that our method improves model error rates by 7.3% and saves up to 25% training time for individual iterations.

2021

pdf bib
Proceedings of the Fourth Workshop on Computational Models of Reference, Anaphora and Coreference
Maciej Ogrodniczuk | Sameer Pradhan | Massimo Poesio | Yulia Grishina | Vincent Ng
Proceedings of the Fourth Workshop on Computational Models of Reference, Anaphora and Coreference

2020

pdf bib
Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference
Maciej Ogrodniczuk | Vincent Ng | Yulia Grishina | Sameer Pradhan
Proceedings of the Third Workshop on Computational Models of Reference, Anaphora and Coreference

pdf
Truecasing German user-generated conversational text
Yulia Grishina | Thomas Gueudre | Ralf Winkler
Proceedings of the Sixth Workshop on Noisy User-generated Text (W-NUT 2020)

True-casing, the task of restoring proper case to (generally) lower case input, is important in downstream tasks and for screen display. In this paper, we investigate truecasing as an in- trinsic task and present several experiments on noisy user queries to a voice-controlled dia- log system. In particular, we compare a rule- based, an n-gram language model (LM) and a recurrent neural network (RNN) approaches, evaluating the results on a German Q&A cor- pus and reporting accuracy for different case categories. We show that while RNNs reach higher accuracy especially on large datasets, character n-gram models with interpolation are still competitive, in particular on mixed- case words where their fall-back mechanisms come into play.

2019

pdf bib
Proceedings of the Second Workshop on Computational Models of Reference, Anaphora and Coreference
Maciej Ogrodniczuk | Sameer Pradhan | Yulia Grishina | Vincent Ng
Proceedings of the Second Workshop on Computational Models of Reference, Anaphora and Coreference

2018

pdf bib
Anaphora Resolution with the ARRAU Corpus
Massimo Poesio | Yulia Grishina | Varada Kolhatkar | Nafise Moosavi | Ina Roesiger | Adam Roussel | Fabian Simonjetz | Alexandra Uma | Olga Uryupina | Juntao Yu | Heike Zinsmeister
Proceedings of the First Workshop on Computational Models of Reference, Anaphora and Coreference

The ARRAU corpus is an anaphorically annotated corpus of English providing rich linguistic information about anaphora resolution. The most distinctive feature of the corpus is the annotation of a wide range of anaphoric relations, including bridging references and discourse deixis in addition to identity (coreference). Other distinctive features include treating all NPs as markables, including non-referring NPs; and the annotation of a variety of morphosyntactic and semantic mention and entity attributes, including the genericity status of the entities referred to by markables. The corpus however has not been extensively used for anaphora resolution research so far. In this paper, we discuss three datasets extracted from the ARRAU corpus to support the three subtasks of the CRAC 2018 Shared Task–identity anaphora resolution over ARRAU-style markables, bridging references resolution, and discourse deixis; the evaluation scripts assessing system performance on those datasets; and preliminary results on these three tasks that may serve as baseline for subsequent research in these phenomena.

2017

pdf
Multi-source annotation projection of coreference chains: assessing strategies and testing opportunities
Yulia Grishina | Manfred Stede
Proceedings of the 2nd Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2017)

In this paper, we examine the possibility of using annotation projection from multiple sources for automatically obtaining coreference annotations in the target language. We implement a multi-source annotation projection algorithm and apply it on an English-German-Russian parallel corpus in order to transfer coreference chains from two sources to the target side. Operating in two settings – a low-resource and a more linguistically-informed one – we show that automatic coreference transfer could benefit from combining information from multiple languages, and assess the quality of both the extraction and the linking of target coreference mentions.

pdf
CORBON 2017 Shared Task: Projection-Based Coreference Resolution
Yulia Grishina
Proceedings of the 2nd Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2017)

The CORBON 2017 Shared Task, organised as part of the Coreference Resolution Beyond OntoNotes workshop at EACL 2017, presented a new challenge for multilingual coreference resolution: we offer a projection-based setting in which one is supposed to build a coreference resolver for a new language exploiting little or even no knowledge of it, with our languages of interest being German and Russian. We additionally offer a more traditional setting, targeting the development of a multilingual coreference resolver without any restrictions on the resources and methods used. In this paper, we describe the task setting and provide the results of one participant who successfully completed the task, comparing their results to the closely related previous research. Analysing the task setting and the results, we discuss the major challenges and make suggestions on the future directions of coreference evaluation.

pdf
Combining the output of two coreference resolution systems for two source languages to improve annotation projection
Yulia Grishina
Proceedings of the Third Workshop on Discourse in Machine Translation

Although parallel coreference corpora can to a high degree support the development of SMT systems, there are no large-scale parallel datasets available due to the complexity of the annotation task and the variability in annotation schemes. In this study, we exploit an annotation projection method to combine the output of two coreference resolution systems for two different source languages (English, German) in order to create an annotated corpus for a third language (Russian). We show that our technique is superior to projecting annotations from a single source language, and we provide an in-depth analysis of the projected annotations in order to assess the perspectives of our approach.

2016

pdf bib
Experiments on bridging across languages and genres
Yulia Grishina
Proceedings of the Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2016)

pdf
Anaphoricity in Connectives: A Case Study on German
Manfred Stede | Yulia Grishina
Proceedings of the Workshop on Coreference Resolution Beyond OntoNotes (CORBON 2016)

2015

pdf
Knowledge-lean projection of coreference chains across languages
Yulia Grishina | Manfred Stede
Proceedings of the Eighth Workshop on Building and Using Comparable Corpora

2014

pdf
Conceptual and Practical Steps in Event Coreference Analysis of Large-scale Data
Fatemeh Torabi Asr | Jonathan Sonntag | Yulia Grishina | Manfred Stede
Proceedings of the Second Workshop on EVENTS: Definition, Detection, Coreference, and Representation