Ralf D. Brown

Also published as: Ralf Brown

This paper presents a method for exploiting document-level similarity between the documents in the training corpus for a corpus-driven (statistical or example-based) machine translation system and the input documents it must translate. The method is simple to implement, efficient (increases the translation time of an example-based system by only a few percent), and robust (still works even when the actual document boundaries in the input text are not known). Experiments on French-English and Arabic-English showed relative gains over the same system without using document-level similarity of up to 7.4% and 5.4%, respectively, on the BLEU metric.

2007

pdf
Improving example-based machine translation through morphological generalization and adaptation
Aaron B. Phillips | Violetta Cavalli-Sforza | Ralf D. Brown
Proceedings of Machine Translation Summit XI: Papers

2006

pdf
Spectral Clustering for Example Based Machine Translation
Rashmi Gangadharaiah | Ralf Brown | Jaime Carbonell
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers

2005

pdf
Symmetric probabilistic alignment for example-based translation
Jae Dong Kim | Ralf D. Brown | Peter J. Jansen | Jaime G. Carbonell
Proceedings of the 10th EAMT Conference: Practical applications of machine translation

pdf bib abs
Context-sensitive Retrieval for Example-based Translation
Ralf Brown
Workshop on example-based machine translation

Example-Based Machine Translation (EBMT) systems have typically operated on individual sentences without taking into account prior context. By adding a simple reweighting of retrieved fragments of training examples on the basis of whether the previous translation retrieved any fragments from examples within a small window of the current instance, translation performance is improved. A further improvement is seen by performing a similar reweighting when another fragment of the current input sentence was retrieved from the same training example. Together, a simple, straightforward implementation of these two factors results in an improvement on the order of 1.0–1.6% in the BLEU metric across multiple data sets in multiple languages.

pdf
Symmetric Probabilistic Alignment
Ralf D. Brown | Jae Dong Kim | Peter J. Jansen | Jaime G. Carbonell
Proceedings of the ACL Workshop on Building and Using Parallel Texts

2004

pdf abs
A modified Burrows-Wheeler transform for highly scalable example-based translation
Ralf D. Brown
Proceedings of the 6th Conference of the Association for Machine Translation in the Americas: Technical Papers

The Burrows-Wheeler Transform (BWT) was originally developed for data compression, but can also be applied to indexing text. In this paper, an adaptation of the BWT to word-based indexing of the training corpus for an example-based machine translation (EBMT) system is presented. The adapted BWT embeds the necessary information to retrieve matched training instances without requiring any additional space and can be instantiated in a compressed form which reduces disk space and memory requirements by about 40% while still remaining searchable without decompression. Both the speed advantage from O(log N) lookups compared to the O(N) lookups in the inverted-file index which had previously been used and the structure of the index itself act as enablers for additional capabilities and run-time speed. Because the BWT groups all instances of any n-gram together, it can be used to quickly enumerate the most-frequent n-grams, for which translations can be precomputed and stored, resulting in an order-of-magnitude speedup at run time.

pdf
Challenges in using an example-based MT system for a transnational digital government project
Violetta Cavalli-Sforza | Ralf D. Brown | Jaime G. Carbonell | Peter G. Jansen | Jae Dong Kim
Proceedings of the 9th EAMT Workshop: Broadening horizons of machine translation and its applications

2003

pdf abs
Reducing boundary friction using translation-fragment overlap
Ralf D. Brown | Rebecca Hutchinson | Paul N. Bennett | Jaime G. Carbonell | Peter Jansen
Proceedings of Machine Translation Summit IX: Papers

Many corpus-based Machine Translation (MT) systems generate a number of partial translations which are then pieced together rather than immediately producing one overall translation. While this makes them more robust to ill-formed input, they are subject to disfluencies at phrasal translation boundaries even for well-formed input. We address this “boundary friction” problem by introducing a method that exploits overlapping phrasal translations and the increased confidence in translation accuracy they imply. We specify an efficient algorithm for producing translations using overlap. Finally, our empirical analysis indicates that this approach produces higher quality translations than the standard method of combining non-overlapping fragments generated by our Example-Based MT (EBMT) system in a peak-to-peak comparison.

2002

pdf bib
Corpus-driven splitting of compound words
Ralf Brown
Proceedings of the 9th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages: Papers

pdf bib
Example-based machine translation
Ralf Brown
Proceedings of the 5th Conference of the Association for Machine Translation in the Americas: Tutorial Descriptions

Machine Translation of minority languages presents unique challenges, including the paucity of bilingual training data and the unavailability of linguistically-trained speakers. This paper focuses on a machine learning approach to transfer-based MT, where data in the form of translations and lexical alignments are elicited from bilingual speakers, and a seeded version-space learning algorithm formulates and refines transfer rules. A rule-generalization lattice is defined based on LFG-style f-structures, permitting generalization operators in the search for the most general rules consistent with the elicited data. The paper presents these methods and illustrates examples.

pdf
Speech Translation on a Tight Budget without Enough Data
Robert E. Frederking | Alan W. Black | Ralf D. Brown | Alexander Rudnicky | John Moody | Eric Steinbrecher
Proceedings of the ACL-02 Workshop on Speech-to-Speech Translation: Algorithms and Systems

pdf
Field Testing the Tongues Speech-to-Speech Machine Translation System
Robert E. Frederking | Alan W. Black | Ralf D. Brown | John Moody | Eric Steinbrecher
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)

2001

pdf abs
Pre-processing of bilingual corpora for Mandarin-English EBMT
Ying Zhang | Ralf Brown | Robert Frederking | Alon Lavie
Proceedings of Machine Translation Summit VIII

Pre-processing of bilingual corpora plays an important role in Example-Based Machine Translation (EBMT) and Statistical-Based Machine Translation (SBMT). For our Mandarin-English EBMT system, pre-processing includes segmentation for Mandarin, bracketing for English and building a statistical dictionary from the corpora. We used the Mandarin segmenter from the Linguistic Data Consortium (LDC). It uses dynamic programming with a frequency dictionary to segment the text. Although the frequency dictionary is large, it does not completely cover the corpora. In this paper, we describe the work we have done to improve the segmentation for Mandarin and the bracketing process for English to increase the length of English phrases. A statistical dictionary is built from the aligned bilingual corpus. It is used as feedback to segmentation and bracketing to re-segment / re-bracket the corpus. The process iterates several times to achieve better results. The final results of the corpus pre-processing are a segmented/bracketed aligned bilingual corpus and a statistical dictionary. We achieved positive results by increasing the average length of Chinese terms about 60% and 10% for English. The statistical dictionary gained about a 30% increase in coverage.

pdf bib
Transfer-rule induction for example-based translation
Ralf D. Brown
Workshop on Example-Based machine Translation

NICE is a machine translation project for low-density languages. We are building a tool that will elicit a controlled corpus from a bilingual speaker who is not an expert in linguistics. The corpus is intended to cover major typological phenomena, as it is designed to work for any language. Using implicational universals, we strive to minimize the number of sentences that each informant has to translate. From the elicited sentences, we learn transfer rules with a version space algorithm. Our vision for MT in the future is one in which systems can be quickly trained for new languages by native speakers, so that speakers of minor languages can participate in education, health care, government, and internet without having to give up their languages.

pdf bib
Adapting an Example-Based Translation System to Chinese
Ying Zhang | Ralf D. Brown | Robert E. Frederking
Proceedings of the First International Conference on Human Language Technology Research

pdf
A Server for Real-Time Event Tracking in News
Ralf D. Brown
Proceedings of the First International Conference on Human Language Technology Research