Constantine D. Spyropoulos

Also published as: Constantine Spyropoulos

2008

pdf bib abs
Segmenting HTML pages using visual, semantic information
Georgios Petasis | Pavlina Fragkou | Aris Theodorakos | Vangelis Karkaletsis | Constantine D. Spyropoulos
Proceedings of the 4th Web as Corpus Workshop

The information explosion of the Web aggravates the problem of effective information retrieval. Even though linguistic approaches found in the literature perform linguistic annotation by creating metadata in the form of tokens, lemmas or part of speech tags, however, this process is insufficient. This is due to the fact that these linguistic metadata do not exploit the actual content of the page, leading to the need of performing semantic annotation based on a predefined semantic model. This paper proposes a new learning approach for performing automatic semantic annotation. This is the result of a two step procedure: the first step partitions a web page into blocks based on its visual layout, while the second, performs subsequent partitioning based on the examination of appearance of specific types of entities denoting the semantic category as well as the application of a number of simple heuristics. Preliminary experiments performed on a manually annotated corpus regarding athletics proved to be very promising.

pdf bib abs
BOEMIE Ontology-Based Text Annotation Tool
Pavlina Fragkou | Georgios Petasis | Aris Theodorakos | Vangelis Karkaletsis | Constantine Spyropoulos
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

The huge amount of the available information in the Web creates the need of effective information extraction systems that are able to produce metadata that satisfy users information needs. The development of such systems, in the majority of cases, depends on the availability of an appropriately annotated corpus in order to learn extraction models. The production of such corpora can be significantly facilitated by annotation tools that are able to annotate, according to a defined ontology, not only named entities but most importantly relations between them. This paper describes the BOEMIE ontology-based annotation tool which is able to locate blocks of text that correspond to specific types of named entities, fill tables corresponding to ontology concepts with those named entities and link the filled tables based on relations defined in the domain ontology. Additionally, it can perform annotation of blocks of text that refer to the same topic. The tool has a user-friendly interface, supports automatic pre-annotation, annotation comparison as well as customization to other annotation schemata. The annotation tool has been used in a large scale annotation task involving 3,000 web pages regarding athletics. It has also been used in another annotation task involving 503 web pages with medical information, in different languages.

2003

This paper describes the work that was undertaken in the Glossasoft project in the area of terminology management. Some of the draw-backs of existing terminology management systems are outlined and an alternative approach to maintaining terminological data is proposed. The approach which we advocate relies on knowledge-based representation techniques. These are used to model conceptual knowledge about the terms included in the database, general knowledge about the subject domain, application-specific knowledge, and - of course - language-specific terminological knowledge. We consider the multifunctionality of the proposed architecture to be one of its major advantages. To illustrate this, we outline how the knowledge representation scheme, which we suggest, could be drawn upon in message generation and machine-assisted translation.

Co-authors

Venues

lrec3
acl1
eamt1
emnlp1
naacl1
show all...

wac1

ws1

Fix data