Robert Frederking

Also published as: Robert E. Frederking

2014

This paper describes a suite of tools for extracting conventionalized metaphors in English, Spanish, Farsi, and Russian. The method depends on three significant resources for each language: a corpus of conventionalized metaphors, a table of conventionalized conceptual metaphors (CCM table), and a set of extraction rules. Conventionalized metaphors are things like “escape from poverty” and “burden of taxation”. For each metaphor, the CCM table contains the metaphorical source domain word (such as “escape”) the target domain word (such as “poverty”) and the grammatical construction in which they can be found. The extraction rules operate on the output of a dependency parser and identify the grammatical configurations (such as a verb with a prepositional phrase complement) that are likely to contain conventional metaphors. We present results on detection rates for conventional metaphors and analysis of the similarity and differences of source domains for conventional metaphors in the four languages.

pdf abs
The CMU METAL Farsi NLP Approach
Weston Feely | Mehdi Manshadi | Robert Frederking | Lori Levin
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

While many high-quality tools are available for analyzing major languages such as English, equivalent freely-available tools for important but lower-resourced languages such as Farsi are more difficult to acquire and integrate into a useful NLP front end. We report here on an accurate and efficient Farsi analysis front end that we have assembled, which may be useful to others who wish to work with written Farsi. The pre-existing components and resources that we incorporated include the Carnegie Mellon TurboParser and TurboTagger (Martins et al., 2010) trained on the Dadegan Treebank (Rasooli et al., 2013), the Uppsala Farsi text normalizer PrePer (Seraji, 2013), the Uppsala Farsi tokenizer (Seraji et al., 2012a), and Jon Dehdaris PerStem (Jadidinejad et al., 2010). This set of tools (combined with additional normalization and tokenization modules that we have developed and made available) achieves a dependency parsing labeled attachment score of 89.49%, unlabeled attachment score of 92.19%, and label accuracy score of 91.38% on a held-out parsing test data set. All of the components and resources used are freely available. In addition to describing the components and resources, we also explain the rationale for our choices.

2012

pdf abs
Supervised Topical Key Phrase Extraction of News Stories using Crowdsourcing, Light Filtering and Co-reference Normalization
Luís Marujo | Anatole Gershman | Jaime Carbonell | Robert Frederking | João P. Neto
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Fast and effective automated indexing is critical for search and personalized services. Key phrases that consist of one or more words and represent the main concepts of the document are often used for the purpose of indexing. In this paper, we investigate the use of additional semantic features and pre-processing steps to improve automatic key phrase extraction. These features include the use of signal words and freebase categories. Some of these features lead to significant improvements in the accuracy of the results. We also experimented with 2 forms of document pre-processing that we call light filtering and co-reference normalization. Light filtering removes sentences from the document, which are judged peripheral to its main content. Co-reference normalization unifies several written forms of the same named entity into a unique form. We also needed a Gold Standard ― a set of labeled documents for training and evaluation. While the subjective nature of key phrase selection precludes a true Gold Standard, we used Amazon's Mechanical Turk service to obtain a useful approximation. Our data indicates that the biggest improvements in performance were due to shallow semantic features, news categories, and rhetorical signals (nDCG 78.47% vs. 68.93%). The inclusion of deeper semantic features such as Freebase sub-categories was not beneficial by itself, but in combination with pre-processing, did cause slight improvements in the nDCG scores.

2010

pdf
CONE: Metrics for Automatic Evaluation of Named Entity Co-Reference Resolution
Bo Lin | Rushin Shah | Robert Frederking | Anatole Gershman
Proceedings of the 2010 Named Entities Workshop

2008

pdf
Inductive Detection of Language Features via Clustering Minimal Pairs: Toward Feature-Rich Grammars in Machine Translation
Jonathan H. Clark | Robert Frederking | Lori Levin
Proceedings of the ACL-08: HLT Second Workshop on Syntax and Structure in Statistical Translation (SSST-2)

pdf bib
Coling 2008: Proceedings of the workshop on Speech Processing for Safety Critical Translation and Pervasive Applications
Pierrette Bouillon | Farzad Ehsani | Robert Frederking | Michael McTear | Manny Rayner
Coling 2008: Proceedings of the workshop on Speech Processing for Safety Critical Translation and Pervasive Applications

pdf
Speech Translation for Triage of Emergency Phonecalls in Minority Languages
Udhyakumar Nallasamy | Alan Black | Tanja Schultz | Robert Frederking | Jerry Weltman
Coling 2008: Proceedings of the workshop on Speech Processing for Safety Critical Translation and Pervasive Applications

pdf abs
Toward Active Learning in Data Selection: Automatic Discovery of Language Features During Elicitation
Jonathan Clark | Robert Frederking | Lori Levin
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Data Selection has emerged as a common issue in language technologies. We define Data Selection as the choosing of a subset of training data that is most effective for a given task. This paper describes deductive feature detection, one component of a data selection system for machine translation. Feature detection determines whether features such as tense, number, and person are expressed in a language. The database of the The World Atlas of Language Structures provides a gold standard against which to evaluate feature detection. The discovered features can be used as input to a Navigator, which uses active learning to determine which piece of language data is the most important to acquire next.

pdf abs
NineOneOne: Recognizing and Classifying Speech for Handling Minority Language Emergency Calls
Udhyakumar Nallasamy | Alan Black | Tanja Schultz | Robert Frederking
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In this paper, we describe NineOneOne (9-1-1), a system designed to recognize and translate Spanish emergency calls for better dispatching. We analyze the research challenges in adapting speech translation technology to 9-1-1 domain. We report our initial research towards building the system and the results of our initial experiments.

Producing machine translation (MT) for the many minority languages in the world is a serious challenge. Minority languages typically have few resources for building MT systems. For many minor languages there is little machine readable text, few knowledgeable linguists, and little money available for MT development. For these reasons, our research programs on minority language MT have focused on leveraging to the maximum extent two resources that are available for minority languages: linguistic structure and bilingual informants. All natural languages contain linguistic structure. And although the details of that linguistic structure vary from language to language, language universals such as context-free syntactic structure and the paradigmatic structure of inflectional morphology, allow us to learn the specific details of a minority language. Similarly, most minority languages possess speakers who are bilingual with the major language of the area. This paper discusses our efforts to utilize linguistic structure and the translation information that bilingual informants can provide in three sub-areas of our rapid development MT program: morphology induction, syntactic transfer rule learning, and refinement of imperfect learned rules.

In this document we will describe a semi-automated process for creating elicitation corpora. An elicitation corpus is translated by a bilingual consultant in order to produce high quality word aligned sentence pairs. The corpus sentences are automatically generated from detailed feature structures using the GenKit generation program. Feature structures themselves are automatically generated from information that is provided by a linguist using our corpus specification software. This helps us to build small, flexible corpora for testing and development of machine translation systems.

2003

pdf
JAVELIN: A Flexible, Planner-Based Architecture for Question Answering
Eric Nyberg | Robert Frederking
Companion Volume of the Proceedings of HLT-NAACL 2003 - Demonstrations

pdf abs
Teaching machine translation in a graduate language technologies program
Teruko Mitamura | Eric Nyberg | Robert Frederking
Workshop on Teaching Translation Technologies and Tools

This paper describes a graduate-level machine translation (MT) course taught at the Language Technologies Institute at Carnegie Mellon University. Most of the students in the course have a background in computer science. We discuss what we teach (the course syllabus), and how we teach it (lectures, homeworks, and projects). The course has evolved steadily over the past several years to incorporate refinements in the set of course topics, how they are taught, and how students “learn by doing”. The course syllabus has also evolved in response to changes in the field of MT and the role that MT plays in various social contexts.

2002

pdf
Design and Evolution of a Language Technologies Curriculum
Robert Frederking | Eric H. Nyberg | Teruko Mitamura | Jaime G. Carbonell
Proceedings of the ACL-02 Workshop on Effective Tools and Methodologies for Teaching Natural Language Processing and Computational Linguistics

pdf
Speech Translation on a Tight Budget without Enough Data
Robert E. Frederking | Alan W. Black | Ralf D. Brown | Alexander Rudnicky | John Moody | Eric Steinbrecher
Proceedings of the ACL-02 Workshop on Speech-to-Speech Translation: Algorithms and Systems

pdf
Field Testing the Tongues Speech-to-Speech Machine Translation System
Robert E. Frederking | Alan W. Black | Ralf D. Brown | John Moody | Eric Steinbrecher
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

pdf abs
The NESPOLE! speech-to-speech translation system
Alon Lavie | Lori Levin | Robert Frederking | Fabio Pianesi
Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: System Descriptions

NESPOLE! is a speech-to-speech machine translation research system designed to provide fully functional speech-to-speech capabilities within real-world settings of common users involved in e-commerce applications. The project is funded jointly by the European Commission and the US NSF. The NESPOLE! system uses a client-server architecture to allow a common user, who is browsing web-pages on the internet, to connect seamlessly in real-time to an agent of the service provider, using a video-conferencing channel and with speech-to-speech translation services mediating the conversation. Shared web pages and annotated images supported via a Whiteboard application are available to enhance the communication.

2001

pdf bib
Adapting an Example-Based Translation System to Chinese
Ying Zhang | Ralf D. Brown | Robert E. Frederking
Proceedings of the First International Conference on Human Language Technology Research

pdf abs
Pre-processing of bilingual corpora for Mandarin-English EBMT
Ying Zhang | Ralf Brown | Robert Frederking | Alon Lavie
Proceedings of Machine Translation Summit VIII

Pre-processing of bilingual corpora plays an important role in Example-Based Machine Translation (EBMT) and Statistical-Based Machine Translation (SBMT). For our Mandarin-English EBMT system, pre-processing includes segmentation for Mandarin, bracketing for English and building a statistical dictionary from the corpora. We used the Mandarin segmenter from the Linguistic Data Consortium (LDC). It uses dynamic programming with a frequency dictionary to segment the text. Although the frequency dictionary is large, it does not completely cover the corpora. In this paper, we describe the work we have done to improve the segmentation for Mandarin and the bracketing process for English to increase the length of English phrases. A statistical dictionary is built from the aligned bilingual corpus. It is used as feedback to segmentation and bracketing to re-segment / re-bracket the corpus. The process iterates several times to achieve better results. The final results of the corpus pre-processing are a segmented/bracketed aligned bilingual corpus and a statistical dictionary. We achieved positive results by increasing the average length of Chinese terms about 60% and 10% for English. The statistical dictionary gained about a 30% increase in coverage.

2000

pdf
WebDIPLOMAT: A Web-Based Interactive Machine Translation System
Christopher Hogan | Robert Frederking
COLING 2000 Volume 2: The 18th International Conference on Computational Linguistics

1999

pdf abs
A new approach to the translating telephone
Robert Frederking | Christopher Hogan | Alexander Rudnicky
Proceedings of Machine Translation Summit VII

The Translating Telephone has been a major goal of speech translation for many years. Previous approaches have attempted to work from limited-domain, fully-automatic translation towards broad-coverage, fully-automatic translation. We are approaching the problem from a different direction: starting with a broad-coverage but not fully-automatic system, and working towards full automation. We believe that working in this direction will provide us with better feedback, by observing users and collecting language data under realistic conditions, and thus may allow more rapid progress towards the same ultimate goal. Our initial approach relies on the wide-spread availability of Internet connections and web browsers to provide a user interface. We describe our initial work, which is an extension of the Diplomat wearable speech translator.

1998

pdf abs
An evaluation of the multi-engine MT architecture
Christopher Hogan | Robert E. Frederking
Proceedings of the Third Conference of the Association for Machine Translation in the Americas: Technical Papers

The Multi-Engine MT (MEMT) architecture combines the outputs of multiple MT engines using a statistical language model of the target language. It has been used successfully in a number of MT research systems, for both text and speech translation. Despite its perceived benefits, there has never been a rigorous, published, double-blind evaluation of the claim that the combined output of a MEMT system is in fact better than that of any one of the component MT engines. We report here the results of such an evaluation. The combined MEMT output is shown to indeed be better overall than the output of the component engines in a Croatian ↔ English MT system. This result is consistent in both translation directions, and between different raters.