Proceedings of the First International Workshop on Free/Open-Source Rule-Based Machine Translation

Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martinez, Francis M. Tyers (Editors)

Anthology ID:
November 2-3
Alacant, Spain
Bib Export formats:

pdf bib
Proceedings of the First International Workshop on Free/Open-Source Rule-Based Machine Translation
Juan Antonio Pérez-Ortiz | Felipe Sánchez-Martinez | Francis M. Tyers

pdf bib
Matxin: developing sustainable machine translation for a less-resourced language
Kepa Sarasola

pdf bib
Anusaaraka: An accessor cum machine translator
Amba Kulkarni

The Apertium machine translation platform: Five years on
Mikel L. Forcada | Francis M. Tyers | Gema Ramírez-Sánchez

This paper describes Apertium: a free/open-source machine translation platform (engine, toolbox and data), its history, its philosophy of design, its technology, the community of developers, the research and business based on it, and its prospects and challenges, now that it is five years old.

Matxin: Moving towards language independence
Aingeru Mayor | Francis M. Tyers

This paper describes some of the issues found when adapting and extending the Matxin free-software machine translation system to other language pairs. It sketches out some of the characteristics of Matxin and offers some possible solutions to these issues.

OpenLogos MT and the SAL representation language
Bernard Scott | Anabela Barreiro

This paper describes OpenLogos, a rule-driven machine translation system, and the syntactic-semantic taxonomy SAL that underlies this system. We illustrate how SAL addresses typical problems relating to source language analysis and target language synthesis. The adaptation of OpenLogos resources to a specific application concerning paraphrasing in Portuguese is also described here. References are provided for access to OpenLogos and to SAL.

Shallow-transfer rule-based machine translation for Swedish to Danish
Francis M. Tyers | Jacob Nordfalk

This article describes the development of a shallow-transfer machine translation system from Swedish to Danish in the Apertium platform. It gives details of the resources used, the methods for constructing the system and an evaluation of the translation quality. The quality is found to be comparable with that of current commercial systems, despite the particularly low coverage of the lexicons.

Reuse of free resources in machine translation between Nynorsk and Bokmål
Kevin Unhammer | Trond Trosterud

We describe the development of a two-way shallow-transfer machine translation system between Norwegian Nynorsk and Norwegian Bokma ̊l built on the Apertium platform, using the Free and Open Source resources Norsk Ordbank and the Oslo–Bergen Constraint Grammar tagger. We detail the integration of these and other resources in the system along with the construction of the lexical and structural transfer, and evaluate the translation quality in comparison with another system. Finally, some future work is suggested.

Development of a morphological analyser for Bengali
Abu Zaher Md Faridee | Francis M. Tyers

This article describes the development of an open-source morphological analyser for Bengali Language using 􏰁nitestate technology. First we discuss the challenges of creating a morphological analyser for a highly in􏰂ectional language like Bengali and then propose a solution to that using lttoolbox, an open-source 􏰁nite-state toolkit. We then evaluate the performance of our developed system and propose ways of improving it further.

An open-source highly scalable web service architecture for the Apertium machine translation engine
Victor M. Sánchez-Cartagena | Juan Antonio Pérez-Ortiz

Some machine translation services like Google Ajax Language API have become very popular as they make the collaboratively created contents of the web 2.0 available to speakers of many languages. One of the keys of its success is its clear and easy-to-use application programming interface (API) and a scalable and reliable service. This paper describes a highly scalable implementation of an Apertium-based translation web service, that aims to make contents available to speakers of lesser resourced languages. The API of this service is compatible with Google’s one, and the scalability of the system is achieved by a new architecture that allows adding or removing new servers at any time; for that, an application placement algorithm which decides which language pairs should be translated on which servers is designed. Our experiments show how the resulting architecture improves the translation rate in comparison to existing Apertium-based servers.

Apertium goes SOA: an efficient and scalable service based on the Apertium rule-based machine translation platform
Pasquale Minervini

Service Oriented Architecture (SOA) is a paradigm for organising and using distributed services that may be under the control of different ownership domains and implemented using various technology stacks. In some contexts, an organisation using an IT infrastructure implementing the SOA paradigm can take a great benefit from the integration, in its business processes, of efficient machine translation (MT) services to overcome language barriers. This paper describes the architecture and the design patterns used to develop an MT service that is efficient, scalable and easy to integrate in new and existing business processes. The service is based on Apertium, a free/opensource rule-based machine translation platform.

A trigram part-of-speech tagger for the Apertium free/open-source machine translation platform
Zaid Md Abdul Wahab Sheikh | Felipe Sánchez-Martínez

This paper describes the implementation of a second-order hidden Markov model (HMM) based part-of-speech tagger for the Apertium free/opensource rule-based machine translation platform. We describe the part-ofspeech (PoS) tagging approach in Apertium and how it is parametrised through a tagger definition file that defines: (1) the set of tags to be used and (2) constrain rules that can be used to forbid certain PoS tag sequences, thus refining the HMM parameters and increasing its tagging accuracy. The paper also reviews the Baum-Welch algorithm used to estimate the HMM parameters and compares the tagging accuracy achieved with that achieved by the original, first-order HMM-based PoS tagger in Apertium.

Joint efforts to further develop and incorporate Apertium into the document management flow at Universitat Oberta de Catalunya
Luis Villarejo Muñoz | Sergio Ortiz Rojas | Mireia Ginestí Rosell

This article describes the needs of UOC regarding translation and how these needs are satisfied by Prompsit further developing a free rule-based machine translation system: Apertium. We initially describe the general framework regarding linguistic needs inside UOC. Then, section 2 introduces Apertium and outlines the development scenario that Prompsit executed. After that, section 3 outlines the specific needs of UOC and why Apertium was chosen as the machine translation engine. Then, section 4 describes some of the features specially developed in this project. Section 5 explains how the linguistic data was improved to increase the quality of the output in Catalan and Spanish. And, finally, we draw conclusions and outline further work originating from the project.