Yannis Korkontzelos


2016

pdf
Ensemble Classification of Grants using LDA-based Features
Yannis Korkontzelos | Beverley Thomas | Makoto Miwa | Sophia Ananiadou
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Classifying research grants into useful categories is a vital task for a funding body to give structure to the portfolio for analysis, informing strategic planning and decision-making. Automating this classification process would save time and effort, providing the accuracy of the classifications is maintained. We employ five classification models to classify a set of BBSRC-funded research grants in 21 research topics based on unigrams, technical terms and Latent Dirichlet Allocation models. To boost precision, we investigate methods for combining their predictions into five aggregate classifiers. Evaluation confirmed that ensemble classification models lead to higher precision. It was observed that there is not a single best-performing aggregate method for all research topics. Instead, the best-performing method for a research topic depends on the number of positive training instances available for this topic. Subject matter experts considered the predictions of aggregate models to correct erroneous or incomplete manual assignments.

pdf
Identifying Content Types of Messages Related to Open Source Software Projects
Yannis Korkontzelos | Paul Thompson | Sophia Ananiadou
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Assessing the suitability of an Open Source Software project for adoption requires not only an analysis of aspects related to the code, such as code quality, frequency of updates and new version releases, but also an evaluation of the quality of support offered in related online forums and issue trackers. Understanding the content types of forum messages and issue trackers can provide information about the extent to which requests are being addressed and issues are being resolved, the percentage of issues that are not being fixed, the cases where the user acknowledged that the issue was successfully resolved, etc. These indicators can provide potential adopters of the OSS with estimates about the level of available support. We present a detailed hierarchy of content types of online forum messages and issue tracker comments and a corpus of messages annotated accordingly. We discuss our experiments to classify forum messages and issue tracker comments into content-related classes, i.e.~to assign them to nodes of the hierarchy. The results are very encouraging.