2016
pdf
Amplexor MTExpert – machine translation adapted to the translation workflow
Alexandru Ceausu
|
Sabine Hunsicker
|
Tudy Droumaguet
Proceedings of the 19th Annual Conference of the European Association for Machine Translation: Projects/Products
2014
pdf
abs
Pre-ordering of phrase-based machine translation input in translation workflow
Alexandru Ceausu
|
Sabine Hunsicker
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
Word reordering is a difficult task for decoders when the languages involved have a significant difference in syntax. Phrase-based statistical machine translation (PBSMT), preferred in commercial settings due to its maturity, is particularly prone to errors in long range reordering. Source sentence pre-ordering, as a pre-processing step before PBSMT, proved to be an efficient solution that can be achieved using limited resources. We propose a dependency-based pre-ordering model with parameters optimized using a reordering score to pre-order the source sentence. The source sentence is then translated using an existing phrase-based system. The proposed solution is very simple to implement. It uses a hierarchical phrase-based statistical machine translation system (HPBSMT) for pre-ordering, combined with a PBSMT system for the actual translation. We show that the system can provide alternate translations of less post-editing effort in a translation workflow with German as the source language.
pdf
Machine translation quality estimation adapted to the translation workflow
Sabine Hunsicker
|
Alexandru Ceausu
Proceedings of Translating and the Computer 36
2012
pdf
abs
IPTranslator: Facilitating Patent Search with Machine Translation
John Tinsley
|
Alexandru Ceausu
|
Jian Zhang
|
Heidi Depraetere
|
Joeri Van de Walle
Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Commercial MT User Program
Intellectual Property professionals frequently need to carry out patent searches for a variety of reasons. During a typical search, they will retrieve approximately 30% of their results in a foreign language. The machine translation (MT) options currently available to patent searchers for these foreign-language patents vary in their quality, consistency, and general level of service. In this article, we introduce IPTranslator; an MT web service designed to cater for the needs of patent searchers. At the core of IPTranslator is a set of MT systems developed specifically for translating patent text. We describe the challenges faced in adapting MT technology to such a complex domain, and how the systems were evaluated to ensure that the quality was fit for purpose. Finally, we present the framework through which the IPTranslator service is delivered to users, and the value-adding features which address many of the issues with existing solutions.
pdf
PLUTO: Automated Solutions for Patent Translation
John Tinsley
|
Alexandru Ceausu
|
Jian Zhang
Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)
2011
pdf
Experiments on Domain Adaptation for Patent Machine Translation in the PLuTO project
Alexandru Ceauşu
|
John Tinsley
|
Jian Zhang
|
Andy Way
Proceedings of the 15th Annual Conference of the European Association for Machine Translation
pdf
An Expectation Maximization Algorithm for Textual Unit Alignment
Radu Ion
|
Alexandru Ceauşu
|
Elena Irimia
Proceedings of the 4th Workshop on Building and Using Comparable Corpora: Comparable Corpora and the Web
2008
pdf
abs
DIAC+: a Professional Diacritics Recovering System
Dan Tufiş
|
Alexandru Ceauşu
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
In languages that use diacritical characters, if these special signs are stripped-off from a word, the resulted string of characters may not exist in the language, and therefore its normative form is, in general, easy to recover. However, this is not always the case, as presence or absence of a diacritical sign attached to a base letter of a word which exists in both variants, may change its grammatical properties or even the meaning, making the recovery of the missing diacritics a difficult task, not only for a program but sometimes even for a human reader. We describe and evaluate an accurate knowledge-based system for automatic recovery of the missing diacritics in MS-Office documents written in Romanian. For the rare cases when the system is not able to make a reliable decision, it either provides the user a list of words with their recovery suggestions, or probabilistically chooses one of the possible changes, but leaves a trace (a highlighted comment) on each word the modification of which was uncertain.
pdf
abs
RACAI’s Linguistic Web Services
Dan Tufiş
|
Radu Ion
|
Alexandru Ceauşu
|
Dan Ştefănescu
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
Nowadays, there are hundreds of Natural Language Processing applications and resources for different languages that are developed and/or used, almost exclusively with a few but notable exceptions, by their creators. Assuming that the right to use a particular application or resource is licensed by the rightful owner, the user is faced with the often not so easy task of interfacing it with his/her own systems. Even if standards are defined that provide a unified way of encoding resources, few are the cases when the resources are actually coded in conformance to the standard (and, at present time, there is no such thing as general NLP application interoperability). Semantic Web came with the promise that the web will be a universal medium for information exchange whatever its content. In this context, the present article outlines a collection of linguistic web services for Romanian and English, developed at the Research Institute for AI for the Romanian Academy (RACAI) which are ready to provide a standardized way of calling particular NLP operations and extract the results without caring about what exactly is going on in the background.
pdf
abs
Unsupervised Lexical Acquisition for Part of Speech Tagging
Dan Tufiş
|
Elena Irimia
|
Radu Ion
|
Alexandru Ceauşu
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
It is known that POS tagging is not very accurate for unknown words (words which the POS tagger has not seen in the training corpora). Thus, a first step to improve the tagging accuracy would be to extend the coverage of the taggers learned lexicon. It turns out that, through the use of a simple procedure, one can extend this lexicon without using additional, hard to obtain, hand-validated training corpora. The basic idea consists of merely adding new words along with their (correct) POS tags to the lexicon and trying to estimate the lexical distribution of these words according to similar ambiguity classes already present in the lexicon. We present a method of automatically acquire high quality POS tagging lexicons based on morphologic analysis and generation. Currently, this procedure works on Romanian for which we have a required paradigmatic generation procedure but the architecture remains general in the sense that given the appropriate substitutes for the morphological generator and POS tagger, one should obtain similar results.
2006
pdf
abs
Dependency-Based Phrase Alignment
Radu Ion
|
Alexandru Ceauşu
|
Dan Tufiş
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Phrase alignment is the task that requires the constituent phrases of two halves of a bitext to be aligned. In order to align phrases, one must discover them first and this article presents a method of aligning phrases that are discovered automatically. Here, the notion of a 'phrase' will be understood as being given by a subtree of a dependency-like structure of a sentence called linkage. To discover phrases, we will make use of two distinct, language independent methods: the IBM-1 model (Brown et al., 1993) adapted to detect linkages and Constrained Lexical Attraction Models (Ion & Barbu Mititelu, 2006). The methods will be combined and the resulted model will be used to annotate the bitext. The accuracy of phrase alignment will be evaluated by obtaining word alignments from link alignments and then by checking the F-measure of the latter word aligner.
pdf
abs
Acquis Communautaire Sentence Alignment using Support Vector Machines
Alexandru Ceauşu
|
Dan Ştefănescu
|
Dan Tufiş
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Sentence alignment is a task that requires not only accuracy, as possible errors can affect further processing, but also requires small computation resources and to be language pair independent. Although many implementations do not use translation equivalents because they are dependent on the language pair, this feature is a requirement for the accuracy increase. The paper presents a hybrid sentence aligner that has two alignment iterations. The first iteration is based mostly on sentences length, and the second is based on a translation equivalents table estimated from the results of the first iteration. The aligner uses a Support Vector Machine classifier to discriminate between positive and negative examples of sentence pairs.
pdf
Improved Lexical Alignment by Combining Multiple Reified Alignments
Dan Tufiş
|
Radu Ion
|
Alexandru Ceauşu
|
Dan Ştefănescu
11th Conference of the European Chapter of the Association for Computational Linguistics
2005
pdf
Combined Word Alignments
Dan Tufiş
|
Radu Ion
|
Alexandru Ceauşu
|
Dan Ştefănescu
Proceedings of the ACL Workshop on Building and Using Parallel Texts