Ana-Maria Barbu


Aligning the Romanian Reference Treebank and the Valence Lexicon of Romanian Verbs
Ana-Maria Barbu | Verginica Barbu Mititelu | Cătălin Mititelu
Proceedings of the Thirteenth Language Resources and Evaluation Conference

We present here the efforts of aligning two language resources for Romanian: the Romanian Reference Treebank and the Valence Lexicon of Romanian Verbs: for each occurrence of those verbs in the treebank that were included as entries in the lexicon, a set of valence frames is automatically assigned, then manually validated by two linguists and, when necessary, corrected. Validating a valence frame also means semantically disambiguating the verb in the respective context. The validation is done by two linguists, on complementary datasets. However, a subset of verbs were validated by both annotators and Cohen’s κ is 0.87 for this subset. The alignment we have made also serves as a method of enhancing the quality of the two resources, as in the process we identify morpho-syntactic annotation mistakes, incomplete valence frames or missing ones. Information from each resource complements the information from the other, thus their value increases. The treebank and the lexicon are freely available, while the links discovered between them are also made available on GitHub.


Romanian Lexical Data Bases: Inflected and Syllabic Forms Dictionaries
Ana-Maria Barbu
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper presents two lexical data bases for Romanian: RoMorphoDict, a dictionary of inflected forms and RoSyllabiDict, a dictionary of syllabified inflected forms. Each data basis is available in two Unicode formats: text and XML. An entry of RoMorphoDict, in text format, contains information on inflected form, its lemma, its morpho-syntactic description and the marking of the stressed vowel in pronunciation, while in XML format, an entry, representing the whole paradigm of a word, contains further informations about roots and paradigm class. An entry of RoSyllabiDict, in both formats, contains information about unsyllabified word, its syllabified correspondent, grammatical information and/or type of syllabification, if it is the case. The stressed vowel is also marked on the syllabified form. Each lexical data base includes the corresponding inflected forms of about 65,000 lemmas, that is, over 700,000 entries in RoMorphoDict, and over 500,000 entries in RoSyllabiDict. Both resources are available for free. The paper describes in detail the content of these data bases and the procedure of building them.


Romanian Valence Dictionary in XML Format
Ana-Maria Barbu | Emil Ionescu | Verginica Barbu Mititelu
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

Valence dictionaries are dictionaries in which logical predicates (most of the times verbs) are inventoried alongside with the semantic and syntactic information regarding the role of the arguments with which they combine, as well as the syntactic restrictions these arguments have to obey. In this article we present the incipient stage of the project “Syntactic and semantic database in XML format: an HPSG representation of verb valences in Romanian”. Its aim is the development of a valence dictionary in XML format for a set of 3000 Romanian verbs. Valences are specified for each sense of each verb, alongside with an illustrative example, possible argument alternations and a set of multiword expressions in which the respective verb occurs with the respective sense. The grammatical formalism we make use of is Head-driven Phrase Structure Grammar, which offers one of the most comprehensive frames of encoding various types of linguistic information for lexical items. XML is the most appropriate mark-up language for describing information structured in HPSG framework. The project can be further on extended so that to cover all Romanian verbs (around 7000) and also other predicates (nouns, adjectives, prepositions).


A Word Alignment System Based on a Translation Equivalence Extractor
Ana-Maria Barbu
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)


TREQ-AL: A word alignment system with limited language resources
Dan Tufiş | Ana-Maria Barbu | Radu Ion
Proceedings of the HLT-NAACL 2003 Workshop on Building and Using Parallel Texts: Data Driven Machine Translation and Beyond


Lexical token alignment: experiments, results and applications
Dan Tufiş | Ana-Maria Barbu
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)