pdf
bib
Proceedings of the Second International Workshop on Free/Open-Source Rule-Based Machine Translation
Felipe Sánchez-Martinez
|
Juan Antonio Pérez-Ortiz
pdf
bib
MOLTO: multilingual on-line translation
Aarne Ranta
pdf
bib
FreeLing: open-source natural language processing for research and development
Lluís Padró
pdf
abs
Towards synchronous extensible dependency grammar
Michael Gasser
Extensible Dependency Grammar (XDG; Debusmann, 2007) is a flexible, modular dependency grammar framework in which sentence analyses consist of multigraphs and processing takes the form of constraint satisfaction. This paper shows how XDG lends itself to grammar-driven machine translation and introduces the machinery necessary for synchronous XDG. Since the approach relies on a shared semantics, it resembles interlingua MT. It differs in that there are no separate analysis and generation phases. Rather, translation consists of the simultaneous analysis and generation of a single source-target “sentence”.
pdf
abs
Taking on new challenges in multi-word unit processing for machine translation
Johanna Monti
|
Anabela Barreiro
|
Annibale Elia
|
Federica Marano
|
Antonella Napoli
This paper discusses the qualitative comparative evaluation performed on the results of two machine translation systems with different approaches to the processing of multi-word units. It proposes a solution for overcoming the difficulties multi-word units present to machine translation by adopting a methodology that combines the lexicon grammar approach with OpenLogos ontology and semantico-syntactic rules. The paper also discusses the importance of a qualitative evaluation metrics to correctly evaluate the performance of machine translation engines with regards to multi-word units.
pdf
Bootstrapping a statistical speech translator from a rule-based one
Manny Rayner
|
Paula Estrella
|
Pierrette Bouillon
pdf
abs
Maca – a configurable tool to integrate Polish morphological data
Adam Radziszewski
|
Tomasz Śniatowski
There are a number of morphological analysers for Polish. Most of these, however, are non-free resources. What is more, different analysers employ different tagsets and tokenisation strategies. This situation calls for a simple and universal framework to join different sources of morphological information, including the existing resources as well as user-provided dictionaries. We present such a configurable framework that allows to write simple configuration files that define tokenisation strategies and the behaviour of morphological analysers, including simple tagset conversion.
pdf
abs
Automatic acquisition of named entities for rule-based machine translation
Antonio Toral
|
Andy Way
This paper proposes to enrich RBMT dictionaries with Named Entities (NEs) automatically acquired from Wikipedia. The method is applied to the Apertium English–Spanish system and its performance compared to that of Apertium with and without handtagged NEs. The system with automatic NEs outperforms the one without NEs, while results vary when compared to a system with handtagged NEs (results are comparable for Spanish→English but slightly worst for English→Spanish). Apart from that, adding automatic NEs contributes to decreasing the amount of unknown terms by more than 10%.
pdf
abs
Apertium advanced web interface: a first step toward interactivity and language tools convergence
Arnaud Vié
|
Luis Villarejo Muñoz
|
Mireia Farrús Cabeceran
|
Jimmy O’Regan
This document describes a project aimed at building a new web interface to the Apertium machine translation platform, including pre-editing and post-editing environments. It contains a description of the accomplished work on this project, as well as an overview of possible future work.
pdf
abs
Rule-based machine translation between Bulgarian and Macedonian
Tihomir Rangelov
This paper describes the development of a two-way shallow-transfer rulebased machine translation system between Bulgarian and Macedonian. It gives an account of the resources and the methods used for constructing the system, including the development of monolingual and bilingual dictionaries, syntactic transfer rules and constraint grammars. An evaluation of the system’s performance was carried out and compared to another commercially available MT system for the two languages. Some future work was suggested.
pdf
abs
A widely used machine translation service and its migration to a free/open-source solution: the case of Softcatalà
Xavier Ivars-Ribes
|
Victor M. Sánchez-Cartagena
Softcatala` is a non-profit association created more than 10 years ago to fight the marginalisation of the Catalan language in information and communication technologies. It has led the localisation of many applications and the creation of a website which allows its users to translate texts between Spanish and Catalan using an external closedsource translation engine. Recently, the closed-source translation back-end has been replaced by a free/open-source solution completely managed by Softcatala`: the Apertium machine translation platform and the ScaleMT web service framework. Thanks to the openness of the new solution, it is possible to take advantage of the huge amount of users of the Softcatala` translation service to improve it, using a series of methods presented in this paper. In addition, a study of the translations requested by the users has been carried out, and it shows that the translation back-end change has not affected the usage patterns.
pdf
abs
Shallow-transfer rule-based machine translation from Czech to Polish
Joanna Ruth
|
Jimmy O’Regan
This article describes the development of an Open Source shallow-transfer machine translation system from Czech to Polish in the Apertium platform. It gives details of the methods and resources used in constructing the system. Although the resulting system has quite a high error rate, it is still competetive with other systems.
pdf
abs
An Italian to Catalan RBMT system reusing data from existing language pairs
Antonio Toral
|
Mireia Ginestí-Rosell
|
Francis Tyers
This paper presents an Italian→Catalan RBMT system automatically built by combining the linguistic data of the existing pairs Spanish–Catalan and Spanish–Italian. A lightweight manual postprocessing is carried out in order to fix inconsistencies in the automatically derived dictionaries and to add very frequent words that are missing according to a corpus analysis. The system is evaluated on the KDE4 corpus and outperforms Google Translate by approximately ten absolute points in terms of both TER and GTM.