Evangelos Milios

2022

pdf abs
SNLP at TextGraphs 2022 Shared Task: Unsupervised Natural Language Premise Selection in Mathematical Texts Using Sentence-MPNet
Paul Trust | Provia Kadusabe | Haseeb Younis | Rosane Minghim | Evangelos Milios | Ahmed Zahran
Proceedings of TextGraphs-16: Graph-based Methods for Natural Language Processing

This paper describes our system for the submission to the TextGraphs 2022 shared task at COLING 2022: Natural Language Premise Selection (NLPS) from mathematical texts. The task of NLPS is about selecting mathematical statements called premises in a knowledge base written in natural language and mathematical formulae that are most likely to be used to prove a particular mathematical proof. We formulated this task as an unsupervised semantic similarity task by first obtaining contextualized embeddings of both the premises and mathematical proofs using sentence transformers. We then obtained the cosine similarity between the embeddings of premises and proofs and then selected premises with the highest cosine scores as the most probable. Our system improves over the baseline system that uses bag of words models based on term frequency inverse document frequency in terms of mean average precision (MAP) by about 23.5% (0.1516 versus 0.1228).

2021

pdf abs
Unsupervised document summarization using pre-trained sentence embeddings and graph centrality
Juan Ramirez-Orta | Evangelos Milios
Proceedings of the Second Workshop on Scholarly Document Processing

This paper describes our submission for the LongSumm task in SDP 2021. We propose a method for incorporating sentence embeddings produced by deep language models into extractive summarization techniques based on graph centrality in an unsupervised manner.The proposed method is simple, fast, can summarize any kind of document of any size and can satisfy any length constraints for the summaries produced. The method offers competitive performance to more sophisticated supervised methods and can serve as a proxy for abstractive summarization techniques

2016

pdf
Non-uniform Language Detection in Technical Writing
Weibo Wang | Abidalrahman Moh’d | Aminul Islam | Axel Soto | Evangelos Milios
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing

pdf
DalGTM at SemEval-2016 Task 1: Importance-Aware Compositional Approach to Short Text Similarity
Jie Mei | Aminul Islam | Evangelos Milios
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf abs
Reddit Temporal N-gram Corpus and its Applications on Paraphrase and Semantic Similarity in Social Media using a Topic-based Latent Semantic Analysis
Anh Dang | Abidalrahman Moh’d | Aminul Islam | Rosane Minghim | Michael Smit | Evangelos Milios
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

This paper introduces a new large-scale n-gram corpus that is created specifically from social media text. Two distinguishing characteristics of this corpus are its monthly temporal attribute and that it is created from 1.65 billion comments of user-generated text in Reddit. The usefulness of this corpus is exemplified and evaluated by a novel Topic-based Latent Semantic Analysis (TLSA) algorithm. The experimental results show that unsupervised TLSA outperforms all the state-of-the-art unsupervised and semi-supervised methods in SEMEVAL 2015: paraphrase and semantic similarity in Twitter tasks.