Mark Liberman

Also published as: M. Liberman, M. Y. Liberman, Mark Y. Liberman


2021

pdf bib
Proceedings of the Second Workshop on Automatic Simultaneous Translation
Hua Wu | Colin Cherry | Liang Huang | Zhongjun He | Qun Liu | Maha Elbayad | Mark Liberman | Haifeng Wang | Mingbo Ma | Ruiqing Zhang
Proceedings of the Second Workshop on Automatic Simultaneous Translation

pdf bib
Proceedings of the 1st Workshop on Benchmarking: Past, Present and Future
Kenneth Church | Mark Liberman | Valia Kordoni
Proceedings of the 1st Workshop on Benchmarking: Past, Present and Future

pdf bib
Benchmarking: Past, Present and Future
Kenneth Church | Mark Liberman | Valia Kordoni
Proceedings of the 1st Workshop on Benchmarking: Past, Present and Future

Where have we been, and where are we going? It is easier to talk about the past than the future. These days, benchmarks evolve more bottom up (such as papers with code). There used to be more top-down leadership from government (and industry, in the case of systems, with benchmarks such as SPEC). Going forward, there may be more top-down leadership from organizations like MLPerf and/or influencers like David Ferrucci, who was responsible for IBM’s success with Jeopardy, and has recently written a paper suggesting how the community should think about benchmarking for machine comprehension. Tasks such as reading comprehension become even more interesting as we move beyond English. Multilinguality introduces many challenges, and even more opportunities.

2020

pdf bib
Proceedings of the LREC 2020 Workshop on "Citizen Linguistics in Language Resource Development"
James Fiumara | Christopher Cieri | Mark Liberman | Chris Callison-Burch
Proceedings of the LREC 2020 Workshop on "Citizen Linguistics in Language Resource Development"

pdf bib
LanguageARC: Developing Language Resources Through Citizen Linguistics
James Fiumara | Christopher Cieri | Jonathan Wright | Mark Liberman
Proceedings of the LREC 2020 Workshop on "Citizen Linguistics in Language Resource Development"

This paper introduces the citizen science platform, LanguageARC, developed within the NIEUW (Novel Incentives and Workflows) project supported by the National Science Foundation under Grant No. 1730377. LanguageARC is a community-oriented online platform bringing together researchers and “citizen linguists” with the shared goal of contributing to linguistic research and language technology development. Like other Citizen Science platforms and projects, LanguageARC harnesses the power and efforts of volunteers who are motivated by the incentives of contributing to science, learning and discovery, and belonging to a community dedicated to social improvement. Citizen linguists contribute language data and judgments by participating in research tasks such as classifying regional accents from audio clips, recording audio of picture descriptions and answering personality questionnaires to create baseline data for NLP research into autism and neurodegenerative conditions. Researchers can create projects on Language ARC without any coding or HTML required using our Project Builder Toolkit.

pdf bib
Proceedings of the First Workshop on Automatic Simultaneous Translation
Hua Wu | Collin Cherry | Liang Huang | Zhongjun He | Mark Liberman | James Cross | Yang Liu
Proceedings of the First Workshop on Automatic Simultaneous Translation

pdf bib
A Progress Report on Activities at the Linguistic Data Consortium Benefitting the LREC Community
Christopher Cieri | James Fiumara | Stephanie Strassel | Jonathan Wright | Denise DiPersio | Mark Liberman
Proceedings of the 12th Language Resources and Evaluation Conference

This latest in a series of Linguistic Data Consortium (LDC) progress reports to the LREC community does not describe any single language resource, evaluation campaign or technology but sketches the activities, since the last report, of a data center devoted to supporting the work of LREC attendees among other research communities. Specifically, we describe 96 new corpora released in 2018-2020 to date, a new technology evaluation campaign, ongoing activities to support multiple common task human language technology programs, and innovations to advance the methodology of language data collection and annotation.

2018

pdf bib
Corpus Phonetics: Past, Present, and Future
Mark Liberman
Proceedings of the First Workshop on Linguistic Resources for Natural Language Processing

Invited talk

pdf bib
Introducing NIEUW: Novel Incentives and Workflows for Eliciting Linguistic Data
Christopher Cieri | James Fiumara | Mark Liberman | Chris Callison-Burch | Jonathan Wright
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf bib
From ‘Solved Problems’ to New Challenges: A Report on LDC Activities
Christopher Cieri | Mark Liberman | Stephanie Strassel | Denise DiPersio | Jonathan Wright | Andrea Mazzucchi
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf bib
Exploring Autism Spectrum Disorders Using HLT
Julia Parish-Morris | Mark Liberman | Neville Ryant | Christopher Cieri | Leila Bateman | Emily Ferguson | Robert Schultz
Proceedings of the Third Workshop on Computational Linguistics and Clinical Psychology

pdf bib
From Human Language Technology to Human Language Science
Mark Liberman
Actes de la conférence conjointe JEP-TALN-RECITAL 2016. Volume 4 : Conférences invitées

Thirty years ago, in order to get past roadblocks in Machine Translation and Automatic Speech Recognition, DARPA invented a new way to organize and manage technological R&D : a “common task” is defined by a formal quantitative evaluation metric and a body of shared training data, and researchers join an open competition to compare approaches. Over the past three decades, this method has produced steadily improving technologies, with many practical applications now possible. And Moore’s law has created a sort of digital shadow universe, which increasingly mirrors the real world in flows and stores of bits, while the same improvements in digital hardware and software make it increasingly easy to pull content out of the these rivers and oceans of information. It’s natural to be excited about these technologies, where we can see an open road to rapid improvements beyond the current state of the art, and an explosion of near-term commercial applications. But there are some important opportunities in a less obvious direction. Several areas of scientific and humanistic research are being revolutionized by the application of Human Language Technology. At a minimum, orders of magnitude more data can be addressed with orders of magnitude less effort - but this change also transforms old theoretical questions, and poses new ones. And eventually, new modes of research organization and funding are likely to emerge..

pdf bib
Building Language Resources for Exploring Autism Spectrum Disorders
Julia Parish-Morris | Christopher Cieri | Mark Liberman | Leila Bateman | Emily Ferguson | Robert T. Schultz
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Autism spectrum disorder (ASD) is a complex neurodevelopmental condition that would benefit from low-cost and reliable improvements to screening and diagnosis. Human language technologies (HLTs) provide one possible route to automating a series of subjective decisions that currently inform “Gold Standard” diagnosis based on clinical judgment. In this paper, we describe a new resource to support this goal, comprised of 100 20-minute semi-structured English language samples labeled with child age, sex, IQ, autism symptom severity, and diagnostic classification. We assess the feasibility of digitizing and processing sensitive clinical samples for data sharing, and identify areas of difficulty. Using the methods described here, we propose to join forces with researchers and clinicians throughout the world to establish an international repository of annotated language samples from individuals with ASD and related disorders. This project has the potential to improve the lives of individuals with ASD and their families by identifying linguistic features that could improve remote screening, inform personalized intervention, and promote advancements in clinically-oriented HLTs.

2015

pdf bib
Sentence selection for automatic scoring of Mandarin proficiency
Jiahong Yuan | Xiaoying Xu | Wei Lai | Weiping Ye | Xinru Zhao | Mark Liberman
Proceedings of the Eighth SIGHAN Workshop on Chinese Language Processing

2014

pdf bib
New Directions for Language Resource Development and Distribution
Christopher Cieri | Denise DiPersio | Mark Liberman | Andrea Mazzucchi | Stephanie Strassel | Jonathan Wright
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Despite the growth in the number of linguistic data centers around the world, their accomplishments and expansions and the advances they have help enable, the language resources that exist are a small fraction of those required to meet the goals of Human Language Technologies (HLT) for the world’s languages and the promises they offer: broad access to knowledge, direct communication across language boundaries and engagement in a global community. Using the Linguistic Data Consortium as a focus case, this paper sketches the progress of data centers, summarizes recent activities and then turns to several issues that have received inadequate attention and proposes some new approaches to their resolution.

pdf bib
Parser Evaluation Using Derivation Trees: A Complement to evalb
Seth Kulick | Ann Bies | Justin Mott | Anthony Kroch | Beatrice Santorini | Mark Liberman
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2013

pdf bib
A Cross-language Study on Automatic Speech Disfluency Detection
Wen Wang | Andreas Stolcke | Jiahong Yuan | Mark Liberman
Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies

2012

pdf bib
Twenty Years of Language Resource Development and Distribution: A Progress Report on LDC Activities
Christopher Cieri | Marian Reed | Denise DiPersio | Mark Liberman
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

On the Linguistic Data Consortium's (LDC) 20th anniversary, this paper describes the changes to the language resource landscape over the past two decades, how LDC has adjusted its practice to adapt to them and how the business model continues to grow. Specifically, we will discuss LDC's evolving roles and changes in the sizes and types of LDC language resources (LR) as well as the data they include and the annotations of that data. We will also discuss adaptations of the LDC business model and the sponsored projects it supports.

2010

pdf bib
Adapting to Trends in Language Resource Development: A Progress Report on LDC Activities
Christopher Cieri | Mark Liberman
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

This paper describes changing needs among the communities that exploit language resources and recent LDC activities and publications that support those needs by providing greater volumes of data and associated resources in a growing inventory of languages with ever more sophisticated annotation. Specifically, it covers the evolving role of data centers with specific emphasis on the LDC, the publications released by the LDC in the two years since our last report and the sponsored research programs that provide LRs initially to participants in those programs but eventually to the larger HLT research communities and beyond.

pdf bib
A New Approach to Lexical Disambiguation of Arabic Text
Rushin Shah | Paramveer S. Dhillon | Mark Liberman | Dean Foster | Mohamed Maamouri | Lyle Ungar
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib
Obituary: Fred Jelinek
Mark Liberman
Computational Linguistics, Volume 36, Issue 4 - December 2010

2009

pdf bib
The Annotation Conundrum
Mark Liberman
Proceedings of the EACL 2009 Workshop on the Interaction between Linguistics and Computational Linguistics: Virtuous, Vicious or Vacuous?

2008

pdf bib
15 Years of Language Resource Creation and Sharing: a Progress Report on LDC Activities
Christopher Cieri | Mark Liberman
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper, the fifth in a series of biennial progress reports, reviews the activities of the Linguistic Data Consortium with particular emphasis on general trends in the language resource landscape and on changes that distinguish the two years since LDC’s last report at LREC from the preceding 8 years. After providing a perspective on the current landscape of language resources, the paper goes on to describe our vision of the role of LDC within the research communities it serves before sketching briefly specific publications and resources creations projects that have been the focus our attention since the last report.

2006

pdf bib
The Mixer and Transcript Reading Corpora: Resources for Multilingual, Crosschannel Speaker Recognition Research
Christopher Cieri | Walt Andrews | Joseph P. Campbell | George Doddington | Jack Godfrey | Shudong Huang | Mark Liberman | Alvin Martin | Hirotaka Nakasone | Mark Przybocki | Kevin Walker
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper describes the planning and creation of the Mixer and Transcript Reading corpora, their properties and yields, and reports on the lessons learned during their development.

pdf bib
More Data and Tools for More Languages and Research Areas: A Progress Report on LDC Activities
Christopher Cieri | Mark Liberman
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This presentation reports on recent progress the Linguistic Data Consortium has made in addressing the needs of multiple research communities by collecting, annotating and distributing, simplifying access and developing standards and tools. Specifically, it describes new trends in publication, a sample of recent projects and significant improvements to LDC Online that improve access to LDC data especially for those with limited computing support.

pdf bib
Integrated Linguistic Resources for Language Exploitation Technologies
Stephanie Strassel | Christopher Cieri | Andrew Cole | Denise Dipersio | Mark Liberman | Xiaoyi Ma | Mohamed Maamouri | Kazuaki Maeda
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Linguistic Data Consortium has recently embarked on an effort to create integrated linguistic resources and related infrastructure for language exploitation technologies within the DARPA GALE (Global Autonomous Language Exploitation) Program. GALE targets an end-to-end system consisting of three major engines: Transcription, Translation and Distillation. Multilingual speech or text from a variety of genres is taken as input and English text is given as output, with information of interest presented in an integrated and consolidated fashion to the end user. GALE's goals require a quantum leap in the performance of human language technology, while also demanding solutions that are more intelligent, more robust, more adaptable, more efficient and more integrated. LDC has responded to this challenge with a comprehensive approach to linguistic resource development designed to support GALE's research and evaluation needs and to provide lasting resources for the larger Human Language Technology community.

pdf bib
A Context Pattern Induction Method for Named Entity Extraction
Partha Pratim Talukdar | Thorsten Brants | Mark Liberman | Fernando Pereira
Proceedings of the Tenth Conference on Computational Natural Language Learning (CoNLL-X)

2004

pdf bib
Integrated Annotation for Biomedical Information Extraction
Seth Kulick | Ann Bies | Mark Liberman | Mark Mandel | Ryan McDonald | Martha Palmer | Andrew Schein | Lyle Ungar | Scott Winters | Pete White
HLT-NAACL 2004 Workshop: Linking Biological Literature, Ontologies and Databases

pdf bib
A Progress Report from the Linguistic Data Consortium: Recent Activities in Resource Creation and Distribution and the Development of Tools and Standards
Christopher Cieri | Mark Liberman
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2002

pdf bib
Language Resource Creation and Distribution at the Linguistic Data Consortium: A Progress Report
Christopher Cieri | Mark Liberman
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf bib
TIDES Language Resources: A Resource Map for Translingual Information Access
Christopher Cieri | Mark Liberman
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2000

pdf bib
ATLAS: A Flexible and Extensible Architecture for Linguistic Annotation
Steven Bird | David Day | John Garofolo | John Henderson | Christophe Laprun | Mark Liberman
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
Issues in Corpus Creation and Distribution: The Evolution of the Linguistic Data Consortium
Christopher Cieri | Mark Liberman
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

pdf bib
Large, Multilingual, Broadcast News Corpora for Cooperative Research in Topic Detection and Tracking: The TDT-2 and TDT-3 Corpus Efforts
Christopher Cieri | David Graff | Mark Liberman | Nii Martey | Stephanie Strassel
Proceedings of the Second International Conference on Language Resources and Evaluation (LREC’00)

1999

pdf bib
Annotation Graphs as a Framework for Multidimensional Linguistic Data Analysis
Steven Bird | Mark Liberman
Towards Standards and Tools for Discourse Tagging

pdf bib
BITS: a method for bilingual text search over the Web
Xiaoyi Ma | Mark Y. Liberman
Proceedings of Machine Translation Summit VII

Parallel corpus are valuable resource for machine translation, multi-lingual text retrieval, language education and other applications, but for various reasons, its availability is very limited at present. Noticed that the World Word Web is a potential source to mine parallel text, researchers are making their efforts to explore the Web in order to get a big collection of bitext. This paper presents BITS (Bilingual Internet Text Search), a system which harvests multilingual texts over the World Wide Web with virtually no human intervention. The technique is simple, easy to port to any language pairs, and with high accuracy. The results of the experiments on German-English pair proved that the method is very successful.

1994

pdf bib
Commentary on Kaplan and Kay
Mark Liberman
Computational Linguistics, Volume 20, Number 3, September 1994

pdf bib
Lexicons for Human Language Technology
Mark Liberman
Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994

1992

pdf bib
Session 1Ob: Core NL Lexicon and Grammar
Mark Liberman
Speech and Natural Language: Proceedings of a Workshop Held at Harriman, New York, February 23-26, 1992

1991

pdf bib
Session 1: Speech and Natural Language Efforts in the U. S. and Abroad
Mark Y. Liberman | Patti Price
Speech and Natural Language: Proceedings of a Workshop Held at Pacific Grove, California, February 19-22, 1991

pdf bib
A Procedure for Quantitatively Comparing the Syntactic Coverage of English Grammars
E. Black | S. Abney | D. Flickenger | C. Gdaniec | R. Grishman | P. Harrison | D. Hindle | R. Ingria | F. Jelinek | J. Klavans | M. Liberman | M. Marcus | S. Roukos | B. Santorini | T. Strzalkowski
Speech and Natural Language: Proceedings of a Workshop Held at Pacific Grove, California, February 19-22, 1991

1990

pdf bib
A Finite-State Morphological Processor for Spanish
Evelyne Tzoukermann | Mark Y. Liberman
COLING 1990 Volume 3: Papers presented to the 13th International Conference on Computational Linguistics

1989

pdf bib
Speaker Independent Phonetic Transcription of Fluent Speech for Large Vocabulary Speech Recognition
S. E. Levinson | M. Y. Liberman | A. Ljolje | L. G. Miller
Speech and Natural Language: Proceedings of a Workshop Held at Philadelphia, Pennsylvania, February 21-23, 1989

pdf bib
Text on Tap: the ACL/DCI
Mark Liberman
Speech and Natural Language: Proceedings of a Workshop Held at Cape Cod, Massachusetts, October 15-18, 1989

1987

pdf bib
Toward Treating English Nominals Correctly
Richard W. Sproat | Mark Y. Liberman
25th Annual Meeting of the Association for Computational Linguistics

1986

pdf bib
Questions about Connectionist Models of Natural Language
Mark Liberman
24th Annual Meeting of the Association for Computational Linguistics

Search