Najeh Hajlaoui


Editing OntoLex-Lemon in VocBench 3
Manuel Fiorelli | Armando Stellato | Tiziano Lorenzetti | Andrea Turbati | Peter Schmitz | Enrico Francesconi | Najeh Hajlaoui | Brahim Batouche
Proceedings of the Twelfth Language Resources and Evaluation Conference

OntoLex-Lemon is a collection of RDF vocabularies for specifying the verbalization of ontologies in natural language. Beyond its original scope, OntoLex-Lemon, as well as its predecessor Monnet lemon, found application in the Linguistic Linked Open Data cloud to represent and interlink language resources on the Semantic Web. Unfortunately, generic ontology and RDF editors were considered inconvenient to use with OntoLex-Lemon because of its complex design patterns and other peculiarities, including indirection, reification and subtle integrity constraints. This perception led to the development of dedicated editors, trading the flexibility of RDF in combining different models (and the features already available in existing RDF editors) for a more direct and streamlined editing of OntoLex-Lemon patterns. In this paper, we investigate on the benefits gained by extending an already existing RDF editor, VocBench 3, with capabilities closely tailored to OntoLex-Lemon and on the challenges that such extension implies. The outcome of such investigation is twofold: a vertical assessment of a new editor for OntoLex-Lemon and, in the broader scope of RDF editor design, a new perspective on which flexibility and extensibility characteristics an editor should meet in order to cover new core modeling vocabularies, for which OntoLex-Lemon represents a use case.


PMKI: an European Commission action for the interoperability, maintainability and sustainability of Language Resources
Peter Schmitz | Enrico Francesconi | Najeh Hajlaoui | Brahim Batouche
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)


DCEP -Digital Corpus of the European Parliament
Najeh Hajlaoui | David Kolovratnik | Jaakko Väyrynen | Ralf Steinberger | Daniel Varga
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We are presenting a new highly multilingual document-aligned parallel corpus called DCEP - Digital Corpus of the European Parliament. It consists of various document types covering a wide range of subject domains. With a total of 1.37 billion words in 23 languages (253 language pairs), gathered in the course of ten years, this is the largest single release of documents by a European Union institution. DCEP contains most of the content of the European Parliament’s official Website. It includes different document types produced between 2001 and 2012, excluding only the documents already exist in the Europarl corpus to avoid overlapping. We are presenting the typical acquisition steps of the DCEP corpus: data access, document alignment, sentence splitting, normalisation and tokenisation, and sentence alignment efforts. The sentence-level alignment is still in progress but based on some first experiments; we showed that DCEP is very useful for NLP applications, in particular for Statistical Machine Translation.

SMT for restricted sublanguage in CAT tool context at the European Parliament
Najeh Hajlaoui
Proceedings of Translating and the Computer 36


Are ACT’s Scores Increasing with Better Translation Quality?
Najeh Hajlaoui
Proceedings of the Eighth Workshop on Statistical Machine Translation


Machine Translation of Labeled Discourse Connectives
Thomas Meyer | Andrei Popescu-Belis | Najeh Hajlaoui | Andrea Gesmundo
Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Research Papers

This paper shows how the disambiguation of discourse connectives can improve their automatic translation, while preserving the overall performance of statistical MT as measured by BLEU. State-of-the-art automatic classifiers for rhetorical relations are used prior to MT to label discourse connectives that signal those relations. These labels are used for MT in two ways: (1) by augmenting factored translation models; and (2) by using the probability distributions of labels in order to train and tune SMT. The improvement of translation quality is demonstrated using a new semi-automated metric for discourse connectives, on the English/French WMT10 data, while BLEU scores remain comparable to non-discourse-aware systems, due to the low frequency of discourse connectives.

pdf bib
Translating English Discourse Connectives into Arabic: a Corpus-based Analysis and an Evaluation Metric
Najeh Hajlaoui | Andrei Popescu-Belis
Fourth Workshop on Computational Approaches to Arabic-Script-based Languages

Discourse connectives can often signal multiple discourse relations, depending on their context. The automatic identification of the Arabic translations of seven English discourse connectives shows how these connectives are differently translated depending on their actual senses. Automatic labelling of English source connectives can help a machine translation system to translate them more correctly. The corpus-based analysis of Arabic translations also enables the definition of a connective-specific evaluation metric for machine translation, which is here validated by human judges on sample English/Arabic translation data.


Predicting Machine Translation Adequacy
Lucia Specia | Najeh Hajlaoui | Catalina Hallett | Wilker Aziz
Proceedings of Machine Translation Summit XIII: Papers


Multilinguization and Personalization of NL-based Systems
Najeh Hajlaoui | Christian Boitet
Proceedings of the 4th Workshop on Cross Lingual Information Access


PolyphraZ: a Tool for the Management of Parallel Corpora
Najeh Hajlaoui | Christian Boitet
Proceedings of the Workshop on Multilingual Linguistic Resources

PolyphraZ: a tool for the quantitative and subjective evaluation of parallel corpora
Najeh Hajlaoui | Christian Boitet
Proceedings of the First International Workshop on Spoken Language Translation: Papers