2008
pdf
abs
Eksairesis: A Domain-Adaptable System for Ontology Building from Unstructured Text
Katia Lida Kermanidis
|
Aristomenis Thanopoulos
|
Manolis Maragoudakis
|
Nikos Fakotakis
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
This paper describes Eksairesis, a system for learning economic domain knowledge automatically from Modern Greek text. The knowledge is in the form of economic terms and the semantic relations that govern them. The entire process in based on the use of minimal language-dependent tools, no external linguistic resources, and merely free, unstructured text. The methodology is thereby easily portable to other domains and other languages. The text is pre-processed with basic morphological annotation, and semantic (named and other) entities are identified using supervised learning techniques. Statistical filtering, i.e. corpora comparison is used to extract domain terms and supervised learning is again employed to detect the semantic relations between pairs of terms. Advanced classification schemata, ensemble learning, and one-sided sampling, are experimented with in order to deal with the noise in the data, which is unavoidable due to the low pre-processing level and the lack of sophisticated resources. An average 68.5% f-score over all the classes is achieved when learning semantic relations. Bearing in mind the use of minimal resources and the highly automated nature of the process, classification performance is very promising, compared to results reported in previous work.
2006
pdf
abs
Dealing with Imbalanced Data using Bayesian Techniques
Manolis Maragoudakis
|
Katia Kermanidis
|
Aristogiannis Garbis
|
Nikos Fakotakis
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
For the present work, we deal with the significant problem of high imbalance in data in binary or multi-class classification problems. We study two different linguistic applications. The former determines whether a syntactic construction (environment) co-occurs with a verb in a natural text corpus consists a subcategorization frame of the verb or not. The latter is called Name Entity Recognition (NER) and it concerns determining whether a noun belongs to a specific Name Entity class. Regarding the subcategorization domain, each environment is encoded as a vector of heterogeneous attributes, where a very high imbalance between positive and negative examples is observed (an imbalance ratio of approximately 1:80). In the NER application, the imbalance between a name entity class and the negative class is even greater (1:120). In order to confront the plethora of negative instances, we suggest a search tactic during training phase that employs Tomek links for reducing unnecessary negative examples from the training set. Regarding the classification mechanism, we argue that Bayesian networks are well suited and we propose a novel network structure which efficiently handles heterogeneous attributes without discretization and is more classification-oriented. Comparing the experimental results with those of other known machine learning algorithms, our methodology performs significantly better in detecting examples of the rare class.
2004
pdf
abs
A Bayesian Model for Shallow Syntactic Parsing of Natural Language Texts
Manolis Maragoudakis
|
Nikos Fakotakis
|
George Kokkinakis
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
For the present work, we introduce and evaluate a novel Bayesian syntactic shallow parser that is able to perform robust detection of pairs of subject-object and subject-direct object-indirect object for a given verb, in a natural language sentence. The shallow parser infers on the correct subject-object pairs based on knowledge provided by Bayesian network learning from annotated text corpora. The DELOS corpus, a collection of economic domain texts that has been automatically annotated using various morphological and syntactic tools was used as training material. Our shallow parser makes use of limited linguistic input. More specifically, we consider only part of speech tagging, the voice and the mood of the verb as well as the head word of a noun phrase. For the task of detecting the head word of a phrase we used a sentence boundary detector. Identifying the head word of a noun phrase, i.e. the word that holds the morphological information (case, number) of the whole phrase, also proves to be very helpful for our task as its morphological tag is all the information that is needed regarding the phrase. The evaluation of the proposed method was performed against three other machine learning techniques, namely naive Bayes, k-Nearest Neighbor and Support Vector Machines, methods that have been previously applied to natural language processing tasks with satisfactory results. The experimental outcomes portray a satisfactory performance of our proposed shallow parser, which reaches almost 92 per cent in terms of precision.
pdf
Learning to Predict Pitch Accents Using Bayesian Belief Networks for Greek Language
Panagiotis Zervas
|
Manolis Maragoudakis
|
Nikos Fakotakis
|
George Kokkinakis
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
pdf
Bayesian Semantics Incorporation to Web Content for Natural Language Information Retrieval
Manolis Maragoudakis
|
Nikos Fakotakis
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)
pdf
Learning Greek Verb Complements: Addressing the Class Imbalance
Katia Kermanidis
|
Manolis Maragoudakis
|
Nikos Fakotakis
|
George Kokkinakis
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics
2002
pdf
Combining Bayesian and Support Vector Machines Learning to automatically complete Syntactical Information for HPSG-like Formalisms
Manolis Maragoudakis
|
Katia Kermanidis
|
Nikos Fakotakis
|
George Kokkinakis
Proceedings of the Third International Conference on Language Resources and Evaluation (LREC’02)