Cristina Vertan

2022

pdf abs
From Inscription to Semi-automatic Annotation of Maya Hieroglyphic Texts
Cristina Vertan | Christian Prager
Proceedings of the Second Workshop on Language Technologies for Historical and Ancient Languages

The Maya script is the only readable autochthonous writing system of the Americas and consists of more than 1000 word signs and syllables. It is only partially deciphered and is the subject of the project “Text Database and Dictionary of the Classic Maya” . Texts are recorded in TEI XML and on the basis of a digital sign and graph catalog, which are stored in the TextGrid virtual repository. Due to the state of decipherment, it is not possible to record hieroglyphic texts directly in phonemically transliterated values. The texts are therefore documented numerically using numeric sign codes based on Eric Thompson’s catalog of the Maya script. The workflow for converting numerical transliteration into textual form involves several steps, with variable solutions possible at each step. For this purpose, the authors have developed ALMAH “Annotator for the Linguistic Analysis of Maya Hieroglyphs”. The tool is a client application and allows semi-automatic generation of phonemic transliteration from numerical transliteration and enables multi-step linguistic annotation. Alternative readings can be entered, and two or more decipherment proposals can be processed in parallel. ALMAH is implemented in JAVA, is based on a graph-data model, and has a user-friendly interface.

2019

pdf bib
Proceedings of the Workshop on Language Technology for Digital Historical Archives
Cristina Vertan | Petya Osenova | Dimitar Iliev
Proceedings of the Workshop on Language Technology for Digital Historical Archives

pdf abs
Controlled Semi-automatic Annotation of Classical Ethiopic
Cristina Vertan
Proceedings of the Workshop on Language Technology for Digital Historical Archives

Preservation of the cultural heritage by means of digital methods became extremely popular during last years. After intensive digitization campaigns the focus moves slowly from the genuine preservation (i.e digital archiving together with standard search mechanisms) to research-oriented usage of materials available electronically. This usage is intended to go far beyond simple reading of digitized materials; researchers should be able to gain new insigts in materials, discover new facts by means of tools relying on innovative algorithms. In this article we will describe the workflow necessary for the annotation of a dichronic corpus of classical Ethiopic, language of essential importance for the study of Early Christianity

pdf abs
Modelling linguistic vagueness and uncertainty in historical texts
Cristina Vertan
Proceedings of the Workshop on Language Technology for Digital Historical Archives

Many applications in Digital Humanities (DH) rely on annotations of the raw material. These annotations (inferred automatically or done manually) assume that labelled facts are either true or false, thus all inferences started on such annotations us boolean logic. This contradicts hermeneutic principles used by humanites in which most part of the knowledge has a degree of truth which varies depending on the experience and the world knowledge of the interpreter. In this paper we will show how uncertainty and vagueness, two main features of any historical text can be encoded in annotations and thus be considered by DH applications.

2017

bib
Proceedings of the First Workshop on Language technology for Digital Humanities in Central and (South-)Eastern Europe
Anca Dinu | Petya Osenova | Cristina Vertan
Proceedings of the First Workshop on Language technology for Digital Humanities in Central and (South-)Eastern Europe

pdf abs
On the annotation of vague expressions: a case study on Romanian historical texts
Anca Dinu | Walther von Hahn | Cristina Vertan
Proceedings of the First Workshop on Language technology for Digital Humanities in Central and (South-)Eastern Europe

Current approaches in Digital .Humanities tend to ignore a central as-pect of any hermeneutic introspection: the intrinsic vagueness of analyzed texts. Especially when dealing with his-torical documents neglecting vague-ness has important implications on the interpretation of the results. In this pa-per we present current limitation of an-notation approaches and describe a current methodology for annotating vagueness for historical Romanian texts.

Data-driven machine translation (MT) approaches became very popular during last years, especially for language pairs for which it is difficult to find specialists to develop transfer rules. Statistical (SMT) or example-based (EBMT) systems can provide reasonable translation quality for assimilation purposes, as long as a large amount of training data is available. Especially SMT systems rely on parallel aligned corpora which have to be statistical relevant for the given language pair. The construction of large domain specific parallel corpora is time- and cost-consuming; the current practice relies on one or two big such corpora per language pair. Recent developed strategies ensure certain portability to other domains through specialized lexicons or small domain specific corpora. In this paper we discuss the influence of different discourse styles on statistical machine translation systems. We investigate how a pure SMT performs when training and test data belong to same domain but the discourse style varies.

pdf bib
Harnessing NLP Techniques in the Processes of Multilingual Content Management
Anelia Belogay | Diman Karagyozov | Svetla Koeva | Cristina Vertan | Adam Przepiórkowski | Dan Cristea | Plovios Raxis
Proceedings of the Demonstrations at the 13th Conference of the European Chapter of the Association for Computational Linguistics

2011

pdf bib
Proceedings of the Workshop on Language Technologies for Digital Humanities and Cultural Heritage
Cristina Vertan | Milena Slavcheva | Petya Osenova | Stelios Piperidis
Proceedings of the Workshop on Language Technologies for Digital Humanities and Cultural Heritage

pdf
Using Manual and Parallel Aligned Corpora for Machine Translation Services within an On-line Content Management System
Cristina Vertan | Monica Gavrila
Proceedings of the Second Workshop on Annotation and Exploitation of Parallel Corpora

pdf
Training Data in Statistical Machine Translation - the More, the Better?
Monica Gavrila | Cristina Vertan
Proceedings of the International Conference Recent Advances in Natural Language Processing 2011

2010

pdf abs
Towards the Integration of Language Tools Within Historical Digital Libraries
Cristina Vertan
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

During the last years the campaign of mass digitization made available catalogues and valuable rare manuscripts and old printed books vie the Internet. The Manuscriptorium digital library ingested hundreds of olumes and it is expected that the volume will grow up in the next years. Other European initiatives like Europeana and Monasterium have also as central activities the online presentation of cultural heritage. With the growing of the available on-line volumes, a special attention was paid to the management and retrieval of documents within digital libraries. Enabling semantic technologies and intelligent linking and search are a big step forward, but they still do not succeed in making the content of old rare books intelligible to the broad public or specialists in other domains or languages. In this paper we will argue that multilingual language technologies have the potential to fill this gap. We overview the existent language resources for historical documents, and present an architecture which aims at presenting such texts to the normal user, without altering the character of the texts.

2009

pdf
ProLiV - a Tool for Teaching by Viewing Computational Linguistics
Monica Gavrila | Cristina Vertan
Proceedings of the ACL-IJCNLP 2009 Software Demonstrations

pdf bib
Proceedings of the Workshop Multilingual resources, technologies and evaluation for central and Eastern European languages
Elena Paskaleva | Stelios Piperidis | Milena Slavcheva | Cristina Vertan
Proceedings of the Workshop Multilingual resources, technologies and evaluation for central and Eastern European languages

2007

pdf bib
Example based machine translation for cross-lingual information retrieval
Cristina Vertan
Proceedings of Machine Translation Summit XI: Tutorials

2005

pdf
MANAGELEX and the Semantic Web
Monica Gavrila | Cristina Vertan
Proceedings of OntoLex 2005 - Ontologies and Lexical Resources

pdf bib abs
Cross-lingual Retrieval in Semantic Web
Cristina Vertan
Workshop on Semantic Web technologies for machine translation

Natural Language is considered the friendliest way of man-machine communication. However the implementation of natural language interfaces faces often the problem of lack of linguistic and world-knowledge, especially when the application domain is not very specific. This is exactly the case of Web-based applications, which aim to serve for retrieval of information in every-day areas of work. The recent Semantic Web activities had as consequence the development of large ontologies for a broad spectrum of domains, as well as of mechanisms for annotating the resources with semantic information. In this paper we present a new architecture aiming to bring together the advantages of natural language querying and the power of semantic W eb. W e will show also how described application can be easily adapted for other domains.

pdf bib abs
Challenges for the Multilingual Semantic Web
Walther v. Hahn | Cristina Vertan
Workshop on Semantic Web technologies for machine translation

In this paper we give an overview of Semantic Web technologies and the impact of these ones for multilingual Web. We present a possible solution for improving the quality of on-line translation systems, using mechanisms and standards from Semantic Web. We focus on Example based machine translation and the automatization of the translation examples extraction by means of RDF-repositories.

2004

pdf
Language Resources for the Semantic Web – perspectives for Machine Translation –
Cristina Vertan
Proceedings of the Second International Workshop on Language Resources for Translation Work, Research and Training

2003

pdf
Menu choice translation: a flexible menu-based controlled natural language system
Cristina Vertan | Walther von Hahn
EAMT Workshop: Improving MT through other language technology tools: resources and tools for building MT

pdf abs
Specification and evaluation of machine translation toy systems - criteria for laboratory assignments
Cristina Vertan | Walther von Hahn
Workshop on Teaching Translation Technologies and Tools

Implementation of machine translation “toy” systems is a good practical exercise especially for computer science students. Our aim in a series of courses on MT in 2002 was to make students familiar both with typical problems of Machine Translation in particular and natural language processing in general, as well as with software implementation. In order to simulate a software implementation proc- ess as realistic as possible, we introduced more than 20 evaluation criteria to be filled by the students when they evaluated their own products. The criteria go far beyond such “toy” systems, but they should demonstrate the students, what a real software evaluation means, and which are the particularities of Machine Translation Evaluation.