Judith L. Klavans

Also published as: J. Klavans, Judith Klavans


2022

pdf
Unsupervised Stem-based Cross-lingual Part-of-Speech Tagging for Morphologically Rich Low-Resource Languages
Ramy Eskander | Cass Lowry | Sujay Khandagale | Judith Klavans | Maria Polinsky | Smaranda Muresan
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

Unsupervised cross-lingual projection for part-of-speech (POS) tagging relies on the use of parallel data to project POS tags from a source language for which a POS tagger is available onto a target language across word-level alignments. The projected tags then form the basis for learning a POS model for the target language. However, languages with rich morphology often yield sparse word alignments because words corresponding to the same citation form do not align well. We hypothesize that for morphologically complex languages, it is more efficient to use the stem rather than the word as the core unit of abstraction. Our contributions are: 1) we propose an unsupervised stem-based cross-lingual approach for POS tagging for low-resource languages of rich morphology; 2) we further investigate morpheme-level alignment and projection; and 3) we examine whether the use of linguistic priors for morphological segmentation improves POS tagging. We conduct experiments using six source languages and eight morphologically complex target languages of diverse typologies. Our results show that the stem-based approach improves the POS models for all the target languages, with an average relative error reduction of 10.3% in accuracy per target language, and outperforms the word-based approach that operates on three-times more data for about two thirds of the language pairs we consider. Moreover, we show that morpheme-level alignment and projection and the use of linguistic priors for morphological segmentation further improve POS tagging.

pdf
Towards Unsupervised Morphological Analysis of Polysynthetic Languages
Sujay Khandagale | Yoann Léveillé | Samuel Miller | Derek Pham | Ramy Eskander | Cass Lowry | Richard Compton | Judith Klavans | Maria Polinsky | Smaranda Muresan
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)

Polysynthetic languages present a challenge for morphological analysis due to the complexity of their words and the lack of high-quality annotated datasets needed to build and/or evaluate computational models. The contribution of this work is twofold. First, using linguists’ help, we generate and contribute high-quality annotated data for two low-resource polysynthetic languages for two tasks: morphological segmentation and part-of-speech (POS) tagging. Second, we present the results of state-of-the-art unsupervised approaches for these two tasks on Adyghe and Inuktitut. Our findings show that for these polysynthetic languages, using linguistic priors helps the task of morphological segmentation and that using stems rather than words as the core unit of abstraction leads to superior performance on POS tagging.

2021

pdf
Minimally-Supervised Morphological Segmentation using Adaptor Grammars with Linguistic Priors
Ramy Eskander | Cass Lowry | Sujay Khandagale | Francesca Callejas | Judith Klavans | Maria Polinsky | Smaranda Muresan
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

2020

pdf
MorphAGram, Evaluation and Framework for Unsupervised Morphological Segmentation
Ramy Eskander | Francesca Callejas | Elizabeth Nichols | Judith Klavans | Smaranda Muresan
Proceedings of the Twelfth Language Resources and Evaluation Conference

Computational morphological segmentation has been an active research topic for decades as it is beneficial for many natural language processing tasks. With the high cost of manually labeling data for morphology and the increasing interest in low-resource languages, unsupervised morphological segmentation has become essential for processing a typologically diverse set of languages, whether high-resource or low-resource. In this paper, we present and release MorphAGram, a publicly available framework for unsupervised morphological segmentation that uses Adaptor Grammars (AG) and is based on the work presented by Eskander et al. (2016). We conduct an extensive quantitative and qualitative evaluation of this framework on 12 languages and show that the framework achieves state-of-the-art results across languages of different typologies (from fusional to polysynthetic and from high-resource to low-resource).

2019

pdf
Unsupervised Morphological Segmentation for Low-Resource Polysynthetic Languages
Ramy Eskander | Judith Klavans | Smaranda Muresan
Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology

Polysynthetic languages pose a challenge for morphological analysis due to the root-morpheme complexity and to the word class “squish”. In addition, many of these polysynthetic languages are low-resource. We propose unsupervised approaches for morphological segmentation of low-resource polysynthetic languages based on Adaptor Grammars (AG) (Eskander et al., 2016). We experiment with four languages from the Uto-Aztecan family. Our AG-based approaches outperform other unsupervised approaches and show promise when compared to supervised methods, outperforming them on two of the four languages.

2018

pdf
Challenges in Speech Recognition and Translation of High-Value Low-Density Polysynthetic Languages
Judith Klavans | John Morgan | Stephen LaRocca | Jeffrey Micher | Clare Voss
Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 2: User Track)

pdf bib
Proceedings of the Workshop on Computational Modeling of Polysynthetic Languages
Judith L. Klavans
Proceedings of the Workshop on Computational Modeling of Polysynthetic Languages

pdf bib
Computational Challenges for Polysynthetic Languages
Judith L. Klavans
Proceedings of the Workshop on Computational Modeling of Polysynthetic Languages

Given advances in computational linguistic analysis of complex languages using Machine Learning as well as standard Finite State Transducers, coupled with recent efforts in language revitalization, the time was right to organize a first workshop to bring together experts in language technology and linguists on the one hand with language practitioners and revitalization experts on the other. This one-day meeting provides a promising forum to discuss new research on polysynthetic languages in combination with the needs of linguistic communities where such languages are written and spoken.

2012


Government Catalog of Language Resources (GCLR)
Judith Klavans
Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Government MT User Program

The purpose of this presentation is to discuss recent efforts within the government to address issues of evaluation and return on investment. Pressure to demonstrate value has increased with the growing amount of foreign language information available, with the variety of languages needing to be exploited, and with the increasing gaps between numbers of language-enabled people and the amount of work to be done. This pressure is only growing as budgets shrink, and as global development grows. Over the past year, the ODNI has led an effort to pull together different government stakeholders to determine some baseline standards for determining Return on Investment via task-based evaluation. Stakeholder consensus on major HLT tasks has involved examination of the different approaches to determining return on investment and how it relates use of HLT in the workflow. In addition to reporting on the goals and progress of this group, we will present future directions and invite community input.

2010


Task-based evaluation methods for machine translation, in practice and theory
Judith L. Klavans
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Government MT User Program

A panel of industry and government experts will discuss ways in which they have applied task-based evaluation for Machine Translation and other language technologies in their organizations and share ideas for new methods that could be tried in the future. As part of the discussion, the panelists will address some of the following points: What task-based evaluation means within their organization, i.e., how task-based evaluation is defined; How task-based evaluation impacts the use of MT technologies in their work environment; Whether task-based evaluation correlates with MT developers' automated metrics and if not, how do we arrive at automated metrics that do correlate with the more expensive task-based evaluation; What "lessons-learned" resulted from the course of performing task-based evaluation; How task-based evaluations can be generalized to multiple workflow environments.

2009

pdf bib
Proceedings of the First Workshop on Language Technologies for African Languages
Lori Levin | John Kiango | Judith Klavans | Guy De Pauw | Gilles-Maurice de Schryver | Peter Waiganjo Wagacha
Proceedings of the First Workshop on Language Technologies for African Languages

2008

pdf
Relation between Agreement Measures on Human Labeling and Machine Learning Performance: Results from an Art History Domain
Rebecca Passonneau | Tom Lippincott | Tae Yano | Judith Klavans
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

We discuss factors that affect human agreement on a semantic labeling task in the art history domain, based on the results of four experiments where we varied the number of labels annotators could assign, the number of annotators, the type and amount of training they received, and the size of the text span being labeled. Using the labelings from one experiment involving seven annotators, we investigate the relation between interannotator agreement and machine learning performance. We construct binary classifiers and vary the training and test data by swapping the labelings from the seven annotators. First, we find performance is often quite good despite lower than recommended interannotator agreement. Second, we find that on average, learning performance for a given functional semantic category correlates with the overall agreement among the seven annotators for that category. Third, we find that learning performance on the data from a given annotator does not correlate with the quality of that annotator’s labeling. We offer recommendations for the use of labeled data in machine learning, and argue that learners should attempt to accommodate human variation. We also note implications for large scale corpus annotation projects that deal with similarly subjective phenomena.

2007

pdf
Concept Disambiguation for Improved Subject Access Using Multiple Knowledge Sources
Tandeep Sidhu | Judith Klavans | Jimmy Lin
Proceedings of the Workshop on Language Technology for Cultural Heritage Data (LaTeCH 2007).

pdf
Measuring Variability in Sentence Ordering for News Summarization
Nitin Madnani | Rebecca Passonneau | Necip Fazil Ayan | John Conroy | Bonnie Dorr | Judith Klavans | Dianne O’Leary | Judith Schlesinger
Proceedings of the Eleventh European Workshop on Natural Language Generation (ENLG 07)

2006

pdf
CLiMB ToolKit: A Case Study of Iterative Evaluation in a Multidisciplinary Project
Rebecca Passonneau | Roberta Blitz | David Elson | Angela Giral | Judith Klavans
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Digital image collections in libraries and other curatorial institutions grow too rapidly to create new descriptive metadata for subject matter search or browsing. CLiMB (Computational Linguistics for Metadata Building) was a project designed to address this dilemma that involved computer scientists, linguists, librarians, and art librarians. The CLiMB project followed an iterative evaluation model: each next phase of the project emerged from the results of an evaluation. After assembling a suite of text processing tools to be used in extracting metada, we conducted a formative evaluation with thirteen participants, using a survey in which we varied the order and type of four conditions under which respondents would propose or select image search terms. Results of the formative evaluation led us to conclude that a CLiMB ToolKit would work best if its main function was to propose terms for users to review. After implementing a prototype ToolKit using a browser interface, we conducted an evaluation with ten experts. Users found the ToolKit very habitable, remained consistently satisfied throughout a lengthy evaluation, and selected a large number of terms per image.

2004

pdf bib
Columbia Newsblaster: Multilingual News Summarization on the Web
David Kirk Evans | Judith L. Klavans | Kathleen R. McKeown
Demonstration Papers at HLT-NAACL 2004

2003

pdf
Columbia’s Newsblaster: New Features and Future Directions
Kathleen McKeown | Regina Barzilay | John Chen | David Elson | David Evans | Judith Klavans | Ani Nenkova | Barry Schiffman | Sergey Sigelman
Companion Volume of the Proceedings of HLT-NAACL 2003 - Demonstrations

2002

pdf
Using the Annotated Bibliography as a Resource for Indicative Summarization
Min-Yen Kan | Judith L. Klavans | Kathleen R. McKeown
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf
A Method for Automatically Building and Evaluating Dictionary Resources
Smaranda Muresan | Judith Klavans
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2001

pdf
Combining linguistic and machine learning techniques for email summarization
Smaranda Muresan | Evelyne Tzoukermann | Judith L. Klavans
Proceedings of the ACL 2001 Workshop on Computational Natural Language Learning (ConLL)

pdf
Applying Natural Language Generation to Indicative Summarization
Min-Yen Kan | Kathleen R. McKeown | Judith L. Klavans
Proceedings of the ACL 2001 Eighth European Workshop on Natural Language Generation (EWNLG)

pdf
Verification and validation of language processing systems: Is it evaluation?
Valerie Barr | Judith L. Klavans
Proceedings of the ACL 2001 Workshop on Evaluation Methodologies for Language and Dialogue Systems

pdf
GIST-IT: Combining Linguistic and Machine Learning Techniques for Email Summarization
Evelyne Tzoukermann | Smaranda Muresan | Judith L. Klavans
Proceedings of the ACL 2001 Workshop on Human Language Technology and Knowledge Management

2000

pdf
Evaluation of Automatically Identified Index Terms for Browsing Electronic Documents
Nina Wacholder | Judith L. Klavans | David K. Evans
Sixth Applied Natural Language Processing Conference

pdf
Evaluation of Computational Linguistic Techniques for Identifying Significant Topics for Browsing Applications
Judith L. Klavans | Nina Wacholder | David K. Evans
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

1999

pdf
Detecting Text Similarity over Short Passages: Exploring Linguistic Feature Combinations via Machine Learning
Vasileios Hatzivassiloglou | Judith L. Klavans | Eleazar Eskin
1999 Joint SIGDAT Conference on Empirical Methods in Natural Language Processing and Very Large Corpora

1998

pdf
Role of Verbs in Document Analysis
Judith L. Klavans | Min-Yen Kan
36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, Volume 1

pdf
Role of Verbs in Document Analysis
Judith Klavans | Min-Yen Kan
COLING 1998 Volume 1: The 17th International Conference on Computational Linguistics

pdf
Linear Segmentation and Segment Significance
Min-Yen Kan | Judith L. Klavans | Kathleen R. McKeown
Sixth Workshop on Very Large Corpora

1997

pdf
Expansion of Multi-Word Terms for Indexing and Retrieval Using Morphology and Syntax
Christian Jacquemin | Judith L. Klavans | Evelyne Tzoukermann
35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics

1994

pdf
Machine-Readable Dictionaries in Text-to-Speech Systems
Judith L. Klavans | Evelyne Tzoukermann
COLING 1994 Volume 2: The 15th International Conference on Computational Linguistics

1992

pdf
Degrees of Stativity: The Lexical Representation of Verb Aspect
Judith L. Klavans | Martin Chodorow
COLING 1992 Volume 4: The 14th International Conference on Computational Linguistics

1991

pdf
A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars
E. Black | S. Abney | D. Flickenger | C. Gdaniec | R. Grishman | P. Harrison | D. Hindle | R. Ingria | F. Jelinek | J. Klavans | M. Liberman | M. Marcus | S. Roukos | B. Santorini | T. Strzalkowski
Speech and Natural Language: Proceedings of a Workshop Held at Pacific Grove, California, February 19-22, 1991

1990

pdf
The BICORD System Combining Lexical Information from Bilingual Corpora and Machine Readable Dictionaries
Judith Klavans | Evelyne Tzoukermann
COLING 1990 Volume 3: Papers presented to the 13th International Conference on Computational Linguistics

1988

pdf
COMPLEX: A Computational Lexicon for Natural Language Systems
Judith Klavans
Coling Budapest 1988 Volume 2: International Conference on Computational Linguistics

1987

pdf
Tools and Methods for Computational Linguistics
Roy J. Byrd | Nicoletta Calzolari | Martin S. Chodorow | Judith L. Klavans | Mary S. Neff | Omneya A. Rizk
Computational Linguistics, Formerly the American Journal of Computational Linguistics, Volume 13, Numbers 3-4, July-December 1987

1986

pdf
Computer Methods for Morphological Analysis
Roy J. Byrd | Judith L. Klavans | Mark Aronoff | Frank Anshen
24th Annual Meeting of the Association for Computational Linguistics