Sergio Espeja


2014

pdf
Ranking Job Offers for Candidates: learning hidden knowledge from Big Data
Marc Poch | Núria Bel | Sergio Espeja | Felipe Navío
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper presents a system for suggesting a ranked list of appropriate vacancy descriptions to job seekers in a job board web site. In particular our work has explored the use of supervised classifiers with the objective of learning implicit relations which cannot be found with similarity or pattern based search methods that rely only on explicit information. Skills, names of professions and degrees, among other examples, are expressed in different languages, showing high variation and the use of ad-hoc resources to trace the relations is very costly. This implicit information is unveiled when a candidate applies for a job and therefore it is information that can be used for learning a model to predict new cases. The results of our experiments, which combine different clustering, classification and ranking methods, show the validity of the approach.

2008

pdf
Automatic Acquisition for low frequency lexical items
Núria Bel | Sergio Espeja | Montserrat Marimon
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

This paper addresses a specific case of the task of lexical acquisition understood as the induction of information about the linguistic characteristics of lexical items on the basis of information gathered from their occurrences in texts. Most of the recent works in the area of lexical acquisition have used methods that take as much textual data as possible as source of evidence, but their performance decreases notably when only few occurrences of a word are available. The importance of covering such low frequency items lies in the fact that a large quantity of the words in any particular collection of texts will be occurring few times, if not just once. Our work proposes to compensate the lack of information resorting to linguistic knowledge on the characteristics of lexical classes. This knowledge, obtained from a lexical typology, is formulated probabilistically to be used in a Bayesian method to maximize the information gathered from single occurrences as to predict the full set of characteristics of the word. Our results show that our method achieves better results than others for the treatment of low frequency items.

pdf
COLDIC, a Lexicographic Platform for LMF compliant lexica
Núria Bel | Sergio Espeja | Montserrat Marimon | Marta Villegas
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Despite of the importance of lexical resources for a number of NLP applications (Machine Translation, Information Extraction, Question Answering, among others), there has been a traditional lack of generic tools for the creation, maintenance and management of computational lexica. The most direct obstacle for the development of generic tools, independent of any particular application format, was the lack of standards for the description and encoding of lexical resources. The availability of the Lexical Markup Framework (LMF) has changed this scenario and has made it possible the development of generic lexical platforms. COLDIC is a generic platform for working with computational lexica. The system has been designed to let the user concentrate on lexicographical tasks, but still being autonomous in the management of the tools. The creation and maintenance of the database, which is the core of the tool, demand no specific training in databases. A LMF compliant schema implemented in a Document Type Definition (DTD) describing the lexical resources is taken by the system to automatically configure the platform. Besides, the most standard web services for interoperability are also generated automatically. Other components of the platform include build-in functions supporting the most common tasks of the lexicographic work.

2007

pdf
Automatic Acquisition of Grammatical Types for Nouns
Núria Bel | Sergio Espeja | Montserrat Marimon
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Companion Volume, Short Papers

pdf
The Spanish Resource Grammar: Pre-processing Strategy and Lexical Acquisition
Montserrat Marimon | Núria Bel | Sergio Espeja | Natalia Seghezzi
ACL 2007 Workshop on Deep Linguistic Processing

2006

pdf
New tools for the encoding of lexical data extracted from corpus
Núria Bel | Sergio Espeja | Montserrat Marimon
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper describes the methodology and tools that are the basis of our platform AAILE.4 AAILE has been built for supplying those working in the construction of lexicons for syntactic parsing with more efficient ways of visualizing and analyzing data extracted from corpus. The platform offers support using techniques such as similarity measures, clustering and pattern classification.