Olga Babko-Malaya

2024

pdf abs
Expanding Russian PropBank: Challenges and Insights for Developing New SRL Resources
Skatje Myers | Roman Khamov | Adam Pollins | Rebekah Tozier | Olga Babko-Malaya | Martha Palmer
Proceedings of the Fifth International Workshop on Designing Meaning Representations @ LREC-COLING 2024

Semantic role labeling (SRL) resources, such as Proposition Bank (PropBank), provide useful input to downstream applications. In this paper we present some challenges and insights we learned while expanding the previously developed Russian PropBank. This new effort involved annotation and adjudication of all predicates within a subset of the prior work in order to provide a test corpus for future applications. We discuss a number of new issues that arose while developing our PropBank for Russian as well as our solutions. Framing issues include: distinguishing between morphological processes that warrant new frames, differentiating between modal verbs and predicate verbs, and maintaining accurate representations of a given language’s semantics. Annotation issues include disagreements derived from variability in Universal Dependency parses and semantic ambiguity within the text. Finally, we demonstrate how Russian sentence structures reveal inherent limitations to PropBank’s ability to capture semantic data. These discussions should prove useful to anyone developing a PropBank or similar SRL resources for a new language.

2012

pdf abs
Identifying Nuggets of Information in GALE Distillation Evaluation
Olga Babko-Malaya | Greg Milette | Michael Schneider | Sarah Scogin
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

This paper describes an approach to automatic nuggetization and implemented system employed in GALE Distillation evaluation to measure the information content of text returned in response to an open-ended question. The system identifies nuggets, or atomic units of information, categorizes them according to their semantic type, and selects different types of nuggets depending on the type of the question. We further show how this approach addresses the main challenges for using automatic nuggetization for QA evaluation: the variability of relevant nuggets and their dependence on the question. Specifically, we propose a template-based approach to nuggetization, where different semantic categories of nuggets are extracted dependent on the template of a question. During evaluation, human annotators judge each snippet returned in response to a query as relevant or irrelevant, whereas automatic template-based nuggetization is further used to identify the semantic units of information that people would have selected as relevant' or irrelevant' nuggets for a given query. Finally, the paper presents the performance results of the nuggetization system which compare the number of automatically generated nuggets and human nuggets and show that our automatic nuggetization is consistent with human judgments.

2010

pdf abs
Evaluation of Document Citations in Phase 2 Gale Distillation
Olga Babko-Malaya | Dan Hunter | Connie Fournelle | Jim White
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

The focus of information retrieval evaluations, such as NISTs TREC evaluations (e.g. Voorhees 2003), is on evaluation of the information content of system responses. On the other hand, retrieval tasks usually involve two different dimensions: reporting relevant information and providing sources of information, including corroborating evidence and alternative documents. Under the DARPA Global Autonomous Language Exploitation (GALE) program, Distillation provides succinct, direct responses to the formatted queries using the outputs of automated transcription and translation technologies. These responses are evaluated in two dimensions: information content, which measures the amount of relevant and non-redundant information, and document support, which measures the number of alternative sources provided in support of reported information. The final metric in the overall GALE distillation evaluation combines the results of scoring of both query responses and document citations. In this paper, we describe our evaluation framework with emphasis on the scoring of document citations and an analysis of how systems perform at providing sources of information.

2008

pdf abs
Annotation of Nuggets and Relevance in GALE Distillation Evaluation
Olga Babko-Malaya
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper presents an approach to annotation that BAE Systems has employed in the DARPA GALE Phase 2 Distillation evaluation. The purpose of the GALE Distillation evaluation is to quantify the amount of relevant and non-redundant information a distillation engine is able to produce in response to a specific, formatted query; and to compare that amount of information to the amount of information gathered by a bilingual human using commonly available state-of-the-art tools. As part of the evaluation, following NIST evaluation methodology of complex question answering (Voorhees, 2003), human annotators were asked to establish the relevancy of responses as well as the presence of atomic facts or information units, called nuggets of information. This paper discusses various challenges to the annotation of nuggets, called nuggetization, which include interaction between the granularity of nuggets and relevancy of these nuggets to the query in question. The approach proposed in the paper views nuggetization as a procedural task and allows annotators to revisit nuggetization based on the requirements imposed by the relevancy guidelines defined with a specific end-user in mind. This approach is shown in the paper to produce consistent annotations with high inter-annotator agreement scores.

In this paper, we present the details of creating a pilot Arabic proposition bank (Propbank). Propbanks exist for both English and Chinese. However the morphological and syntactic expression of linguistic phenomena in Arabic yields a very different type of process in creating an Arabic propbank. Hence, we highlight those characteristics of Arabic that make creating a propbank for the language a different challenge compared to the creation of an English Propbank.We believe that many of the lessons learned in dealing with Arabic could generalise to other languages that exhibit equally rich morphology and relatively free word order.