Cristina Vertan


From Inscription to Semi-automatic Annotation of Maya Hieroglyphic Texts
Cristina Vertan | Christian Prager
Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages

The Maya script is the only readable autochthonous writing system of the Americas and consists of more than 1000 word signs and syllables. It is only partially deciphered and is the subject of the project “Text Database and Dictionary of the Classic Maya” . Texts are recorded in TEI XML and on the basis of a digital sign and graph catalog, which are stored in the TextGrid virtual repository. Due to the state of decipherment, it is not possible to record hieroglyphic texts directly in phonemically transliterated values. The texts are therefore documented numerically using numeric sign codes based on Eric Thompson’s catalog of the Maya script. The workflow for converting numerical transliteration into textual form involves several steps, with variable solutions possible at each step. For this purpose, the authors have developed ALMAH “Annotator for the Linguistic Analysis of Maya Hieroglyphs”. The tool is a client application and allows semi-automatic generation of phonemic transliteration from numerical transliteration and enables multi-step linguistic annotation. Alternative readings can be entered, and two or more decipherment proposals can be processed in parallel. ALMAH is implemented in JAVA, is based on a graph-data model, and has a user-friendly interface.


Proceedings of the Workshop on Language Technology for Digital Historical Archives
Cristina Vertan | Petya Osenova | Dimitar Iliev
Proceedings of the Workshop on Language Technology for Digital Historical Archives

Controlled Semi-automatic Annotation of Classical Ethiopic
Cristina Vertan
Proceedings of the Workshop on Language Technology for Digital Historical Archives

Preservation of the cultural heritage by means of digital methods became extremely popular during last years. After intensive digitization campaigns the focus moves slowly from the genuine preservation (i.e digital archiving together with standard search mechanisms) to research-oriented usage of materials available electronically. This usage is intended to go far beyond simple reading of digitized materials; researchers should be able to gain new insigts in materials, discover new facts by means of tools relying on innovative algorithms. In this article we will describe the workflow necessary for the annotation of a dichronic corpus of classical Ethiopic, language of essential importance for the study of Early Christianity

Modelling linguistic vagueness and uncertainty in historical texts
Cristina Vertan
Proceedings of the Workshop on Language Technology for Digital Historical Archives

Many applications in Digital Humanities (DH) rely on annotations of the raw material. These annotations (inferred automatically or done manually) assume that labelled facts are either true or false, thus all inferences started on such annotations us boolean logic. This contradicts hermeneutic principles used by humanites in which most part of the knowledge has a degree of truth which varies depending on the experience and the world knowledge of the interpreter. In this paper we will show how uncertainty and vagueness, two main features of any historical text can be encoded in annotations and thus be considered by DH applications.


Proceedings of the First Workshop on Language technology for Digital Humanities in Central and (South-)Eastern Europe
Anca Dinu | Petya Osenova | Cristina Vertan
Proceedings of the First Workshop on Language technology for Digital Humanities in Central and (South-)Eastern Europe

On the annotation of vague expressions: a case study on Romanian historical texts
Anca Dinu | Walther von Hahn | Cristina Vertan
Proceedings of the First Workshop on Language technology for Digital Humanities in Central and (South-)Eastern Europe

Current approaches in Digital .Humanities tend to ignore a central as-pect of any hermeneutic introspection: the intrinsic vagueness of analyzed texts. Especially when dealing with his-torical documents neglecting vague-ness has important implications on the interpretation of the results. In this pa-per we present current limitation of an-notation approaches and describe a current methodology for annotating vagueness for historical Romanian texts.


Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects
Preslav Nakov | Marcos Zampieri | Petya Osenova | Liling Tan | Cristina Vertan | Nikola Ljubešić | Jörg Tiedemann
Proceedings of the Joint Workshop on Language Technology for Closely Related Languages, Varieties and Dialects


Proceedings of the 8th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH)
Kalliopi Zervanou | Cristina Vertan | Antal van den Bosch | Caroline Sporleder
Proceedings of the 8th Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities (LaTeCH)

Proceedings of the EMNLP’2014 Workshop on Language Technology for Closely Related Languages and Language Variants
Preslav Nakov | Petya Osenova | Cristina Vertan
Proceedings of the EMNLP’2014 Workshop on Language Technology for Closely Related Languages and Language Variants

Proceedings of the Workshop on Automatic Text Simplification - Methods and Applications in the Multilingual Society (ATS-MA 2014)
Constantin Orasan | Petya Osenova | Cristina Vertan
Proceedings of the Workshop on Automatic Text Simplification - Methods and Applications in the Multilingual Society (ATS-MA 2014)

Making historical texts accessible to everybody
Cristina Vertan | Walther von Hahn
Proceedings of the Workshop on Automatic Text Simplification - Methods and Applications in the Multilingual Society (ATS-MA 2014)


A New Syntactic Metric for Evaluation of Machine Translation
Melania Duma | Cristina Vertan | Wolfgang Menzel
51st Annual Meeting of the Association for Computational Linguistics Proceedings of the Student Research Workshop

Proceedings of the Workshop on Adaptation of Language Resources and Tools for Closely Related Languages and Language Variants
Cristina Vertan | Milena Slavcheva | Petya Osenova
Proceedings of the Workshop on Adaptation of Language Resources and Tools for Closely Related Languages and Language Variants

Language diversity and implications for Language technology in the Multilingual Europe
Cristina Vertan | Walther von Hahn
Proceedings of the Workshop on Adaptation of Language Resources and Tools for Closely Related Languages and Language Variants


Same domain different discourse style - A case study on Language Resources for data-driven Machine Translation
Monica Gavrila | Walther v. Hahn | Cristina Vertan
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Data-driven machine translation (MT) approaches became very popular during last years, especially for language pairs for which it is difficult to find specialists to develop transfer rules. Statistical (SMT) or example-based (EBMT) systems can provide reasonable translation quality for assimilation purposes, as long as a large amount of training data is available. Especially SMT systems rely on parallel aligned corpora which have to be statistical relevant for the given language pair. The construction of large domain specific parallel corpora is time- and cost-consuming; the current practice relies on one or two big such corpora per language pair. Recent developed strategies ensure certain portability to other domains through specialized lexicons or small domain specific corpora. In this paper we discuss the influence of different discourse styles on statistical machine translation systems. We investigate how a pure SMT performs when training and test data belong to same domain but the discourse style varies.

Two approaches for integrating translation and retrieval in real applications
Cristina Vertan
Proceedings of the Joint Workshop on Exploiting Synergies between Information Retrieval and Machine Translation (ESIRMT) and Hybrid Approaches to Machine Translation (HyTra)

Harnessing NLP Techniques in the Processes of Multilingual Content Management
Anelia Belogay | Diman Karagyozov | Svetla Koeva | Cristina Vertan | Adam Przepiórkowski | Dan Cristea | Plovios Raxis
Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics


Proceedings of the Workshop on Language Technologies for Digital Humanities and Cultural Heritage
Cristina Vertan | Milena Slavcheva | Petya Osenova | Stelios Piperidis
Proceedings of the Workshop on Language Technologies for Digital Humanities and Cultural Heritage

Using Manual and Parallel Aligned Corpora for Machine Translation Services within an On-line Content Management System
Cristina Vertan | Monica Gavrila
Proceedings of the Second Workshop on Annotation and Exploitation of Parallel Corpora

Training Data in Statistical Machine Translation - the More, the Better?
Monica Gavrila | Cristina Vertan
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011


Towards the Integration of Language Tools Within Historical Digital Libraries
Cristina Vertan
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

During the last years the campaign of mass digitization made available catalogues and valuable rare manuscripts and old printed books vie the Internet. The Manuscriptorium digital library ingested hundreds of olumes and it is expected that the volume will grow up in the next years. Other European initiatives like Europeana and Monasterium have also as central activities the online presentation of cultural heritage. With the growing of the available on-line volumes, a special attention was paid to the management and retrieval of documents within digital libraries. Enabling semantic technologies and intelligent linking and search are a big step forward, but they still do not succeed in making the content of old rare books intelligible to the broad public or specialists in other domains or languages. In this paper we will argue that multilingual language technologies have the potential to fill this gap. We overview the existent language resources for historical documents, and present an architecture which aims at presenting such texts to the normal user, without altering the character of the texts.


Proceedings of the Workshop Multilingual resources, technologies and evaluation for central and Eastern European languages
Elena Paskaleva | Stelios Piperidis | Milena Slavcheva | Cristina Vertan
Proceedings of the Workshop Multilingual resources, technologies and evaluation for central and Eastern European languages

ProLiV - a Tool for Teaching by Viewing Computational Linguistics
Monica Gavrila | Cristina Vertan
Proceedings of the ACL-IJCNLP 2009 Software Demonstrations


Example based machine translation for cross-lingual information retrieval
Cristina Vertan
Proceedings of Machine Translation Summit XI: Tutorials


MANAGELEX and the Semantic Web
Monica Gavrila | Cristina Vertan
Proceedings of OntoLex 2005 - Ontologies and Lexical Resources

Cross-lingual Retrieval in Semantic Web
Cristina Vertan
Workshop on Semantic Web technologies for machine translation

Natural Language is considered the friendliest way of man-machine communication. However the implementation of natural language interfaces faces often the problem of lack of linguistic and world-knowledge, especially when the application domain is not very specific. This is exactly the case of Web-based applications, which aim to serve for retrieval of information in every-day areas of work. The recent Semantic Web activities had as consequence the development of large ontologies for a broad spectrum of domains, as well as of mechanisms for annotating the resources with semantic information. In this paper we present a new architecture aiming to bring together the advantages of natural language querying and the power of semantic W eb. W e will show also how described application can be easily adapted for other domains.

Challenges for the Multilingual Semantic Web
Walther v. Hahn | Cristina Vertan
Workshop on Semantic Web technologies for machine translation

In this paper we give an overview of Semantic Web technologies and the impact of these ones for multilingual Web. We present a possible solution for improving the quality of on-line translation systems, using mechanisms and standards from Semantic Web. We focus on Example based machine translation and the automatization of the translation examples extraction by means of RDF-repositories.


Language Resources for the Semantic Web – perspectives for Machine Translation –
Cristina Vertan
Proceedings of the Second International Workshop on Language Resources for Translation Work, Research and Training


Specification and evaluation of machine translation toy systems - criteria for laboratory assignments
Cristina Vertan | Walther von Hahn
Workshop on Teaching Translation Technologies and Tools

Implementation of machine translation “toy” systems is a good practical exercise especially for computer science students. Our aim in a series of courses on MT in 2002 was to make students familiar both with typical problems of Machine Translation in particular and natural language processing in general, as well as with software implementation. In order to simulate a software implementation proc- ess as realistic as possible, we introduced more than 20 evaluation criteria to be filled by the students when they evaluated their own products. The criteria go far beyond such “toy” systems, but they should demonstrate the students, what a real software evaluation means, and which are the particularities of Machine Translation Evaluation.

Menu choice translation: a flexible menu-based controlled natural language system
Cristina Vertan | Walther von Hahn
EAMT Workshop: Improving MT through other language technology tools: resources and tools for building MT


Architectures of “toy” systems for teaching machine translation
Walther v. Hahn | Cristina Vertan
Proceedings of the 6th EAMT Workshop: Teaching Machine Translation