This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
LindaBrandschain
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
The Greybeard Project was designed so as to enable research in speaker recognition using data that have been collected over a long period of time. Since 1994, LDC has been collecting speech samples for use in research and evaluations. By mining our earlier collections we assembled a list of subjects who had participated in multiple studies. These participants were then contacted and asked to take part in the Greybeard Project. The only constraint was that the participants must have made numerous calls in prior studies and the calls had to be a minimum of two years old. The archived data was sorted by participant and subsequent calls were added to their files. This is the first longitudinal study of its kind. The resulting corpus contains multiple calls for each participant that span anywhere from two to 12 years in time. It is our hope that these data will enable speaker recognition researchers to explore the effects of aging on voice.
Linguistic Data Consortiums Human Subjects Data Collection lab conducts multi-modal speech collections to develop corpora for use in speech, speaker and language research and evaluations. The Mixer collections have evolved over the years to best accommodate the ever changing needs of the research community and to hopefully keep one step ahead by providing increasingly challenging data. Over the years Mixer collections have grown to include socio-linguistic interviews, a wide variety of telephone conditions and multiple languages, recording conditions, channels and speech acts.. Mixer 6 was the most recent collection. This paper describes the Mixer 6 Phase 1 project. Mixer 6 Phase 1 was a study supporting linguistic research, technology development and education. The object of this study was to record speech in a variety of situations that vary formality and model multiple naturally occurring interactions as well as a variety of channel conditions
The original Mixer corpus was designed to satisfy developing commercial and forensic needs. The resulting Mixer corpora, Phases 1 through 5, have evolved to support and increasing variety of research tasks, including multilingual and cross-channel recognition. The Mixer Phases 4 and 5 corpora feature a wider variety of channels and greater variation in the situations under which the speech is recorded. This paper focuses on the plans, progress and results of Mixer 4 and 5.
The goal of the DARPA MADCAT (Multilingual Automatic Document Classification Analysis and Translation) Program is to automatically convert foreign language text images into English transcripts, for use by humans and downstream applications. The first phase the program focuses on translation of handwritten Arabic documents. Linguistic Data Consortium (LDC) is creating publicly available linguistic resources for MADCAT technologies, on a scale and richness not previously available. Corpora will consist of existing LDC corpora and data donations from MADCAT partners, plus new data collection to provide high quality material for evaluation and to address strategic gaps (for genre, dialect, image quality, etc.) in the existing resources. Training and test data properties will expand over time to encompass a wide range of topics and genres: letters, diaries, training manuals, brochures, signs, ledgers, memos, instructions, postcards and forms among others. Data will be ground truthed, with line, word and token segmentation and zoning, and translations and word alignments will be produced for a subset. Evaluation data will be carefully selected from the available data pools and high quality references will be produced, which can be used to compare MADCAT system performance against the human-produced gold standard.