Vasile Rus

2018

pdf abs
A Tutorial Markov Analysis of Effective Human Tutorial Sessions
Nabin Maharjan | Vasile Rus
Proceedings of the 5th Workshop on Natural Language Processing Techniques for Educational Applications

This paper investigates what differentiates effective tutorial sessions from less effective sessions. Towards this end, we characterize and explore human tutors’ actions in tutorial dialogue sessions by mapping the tutor-tutee interactions, which are streams of dialogue utterances, into streams of actions, based on the language-as-action theory. Next, we use human expert judgment measures, evidence of learning (EL) and evidence of soundness (ES), to identify effective and ineffective sessions. We perform sub-sequence pattern mining to identify sub-sequences of dialogue modes that discriminate good sessions from bad sessions. We finally use the results of sub-sequence analysis method to generate a tutorial Markov process for effective tutorial sessions.

2017

pdf abs
DT_Team at SemEval-2017 Task 1: Semantic Similarity Using Alignments, Sentence-Level Embeddings and Gaussian Mixture Model Output
Nabin Maharjan | Rajendra Banjade | Dipesh Gautam | Lasang J. Tamang | Vasile Rus
Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017)

We describe our system (DT Team) submitted at SemEval-2017 Task 1, Semantic Textual Similarity (STS) challenge for English (Track 5). We developed three different models with various features including similarity scores calculated using word and chunk alignments, word/sentence embeddings, and Gaussian Mixture Model(GMM). The correlation between our system’s output and the human judgments were up to 0.8536, which is more than 10% above baseline, and almost as good as the best performing system which was at 0.8547 correlation (the difference is just about 0.1%). Also, our system produced leading results when evaluated with a separate STS benchmark dataset. The word alignment and sentence embeddings based features were found to be very effective.

2016

pdf abs
SemAligner: A Method and Tool for Aligning Chunks with Semantic Relation Types and Semantic Similarity Scores
Nabin Maharjan | Rajendra Banjade | Nobal Bikram Niraula | Vasile Rus
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

This paper introduces a ruled-based method and software tool, called SemAligner, for aligning chunks across texts in a given pair of short English texts. The tool, based on the top performing method at the Interpretable Short Text Similarity shared task at SemEval 2015, where it was used with human annotated (gold) chunks, can now additionally process plain text-pairs using two powerful chunkers we developed, e.g. using Conditional Random Fields. Besides aligning chunks, the tool automatically assigns semantic relations to the aligned chunks (such as EQUI for equivalent and OPPO for opposite) and semantic similarity scores that measure the strength of the semantic relation between the aligned chunks. Experiments show that SemAligner performs competitively for system generated chunks and that these results are also comparable to results obtained on gold chunks. SemAligner has other capabilities such as handling various input formats and chunkers as well as extending lookup resources.

pdf abs
DT-Neg: Tutorial Dialogues Annotated for Negation Scope and Focus in Context
Rajendra Banjade | Vasile Rus
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Negation is often found more frequent in dialogue than commonly written texts, such as literary texts. Furthermore, the scope and focus of negation depends on context in dialogues than other forms of texts. Existing negation datasets have focused on non-dialogue texts such as literary texts where the scope and focus of negation is normally present within the same sentence where the negation is located and therefore are not the most appropriate to inform the development of negation handling algorithms for dialogue-based systems. In this paper, we present DT -Neg corpus (DeepTutor Negation corpus) which contains texts extracted from tutorial dialogues where students interacted with an Intelligent Tutoring System (ITS) to solve conceptual physics problems. The DT -Neg corpus contains annotated negations in student responses with scope and focus marked based on the context of the dialogue. Our dataset contains 1,088 instances and is available for research purposes at http://language.memphis.edu/dt-neg.

pdf
DTSim at SemEval-2016 Task 1: Semantic Similarity Model Including Multi-Level Alignment and Vector-Based Compositional Semantics
Rajendra Banjade | Nabin Maharjan | Dipesh Gautam | Vasile Rus
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf
DTSim at SemEval-2016 Task 2: Interpreting Similarity of Texts Based on Automated Chunking, Chunk Alignment and Semantic Relation Prediction
Rajendra Banjade | Nabin Maharjan | Nobal Bikram Niraula | Vasile Rus
Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016)

pdf
Evaluation Dataset (DT-Grade) and Word Weighting Approach towards Constructed Short Answers Assessment in Tutorial Dialogue Context
Rajendra Banjade | Nabin Maharjan | Nobal Bikram Niraula | Dipesh Gautam | Borhan Samei | Vasile Rus
Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications

pdf abs
Joint Inference for Mode Identification in Tutorial Dialogues
Deepak Venugopal | Vasile Rus
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers

Identifying dialogue acts and dialogue modes during tutorial interactions is an extremely crucial sub-step in understanding patterns of effective tutor-tutee interactions. In this work, we develop a novel joint inference method that labels each utterance in a tutoring dialogue session with a dialogue act and a specific mode from a set of pre-defined dialogue acts and modes, respectively. Specifically, we develop our joint model using Markov Logic Networks (MLNs), a framework that combines first-order logic with probabilities, and is thus capable of representing complex, uncertain knowledge. We define first-order formulas in our MLN that encode the inter-dependencies between dialogue modes and more fine-grained dialogue actions. We then use a joint inference to jointly label the modes as well as the dialogue acts in an utterance. We compare our system against a pipeline system based on SVMs on a real-world dataset with tutoring sessions of over 500 students. Our results show that the joint inference system is far more effective than the pipeline system in mode detection, and improves over the performance of the pipeline system by about 6 points in F1 score. The joint inference system also performs much better than the pipeline system in the context of labeling modes that highlight important pedagogical steps in tutoring.

2015

pdf
Judging the Quality of Automatically Generated Gap-fill Question using Active Learning
Nobal Bikram Niraula | Vasile Rus
Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications

pdf
An Optimal Quadratic Approach to Monolingual Paraphrase Alignment
Mihai Lintean | Vasile Rus
Proceedings of the 20th Nordic Conference of Computational Linguistics (NODALIDA 2015)

2014

pdf bib abs
On Paraphrase Identification Corpora
Vasile Rus | Rajendra Banjade | Mihai Lintean
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We analyze in this paper a number of data sets proposed over the last decade or so for the task of paraphrase identification. The goal of the analysis is to identify the advantages as well as shortcomings of the previously proposed data sets. Based on the analysis, we then make recommendations about how to improve the process of creating and using such data sets for evaluating in the future approaches to the task of paraphrase identification or the more general task of semantic similarity. The recommendations are meant to improve our understanding of what a paraphrase is, offer a more fair ground for comparing approaches, increase the diversity of actual linguistic phenomena that future data sets will cover, and offer ways to improve our understanding of the contributions of various modules or approaches proposed for solving the task of paraphrase identification or similar tasks.

pdf abs
The DARE Corpus: A Resource for Anaphora Resolution in Dialogue Based Intelligent Tutoring Systems
Nobal Niraula | Vasile Rus | Rajendra Banjade | Dan Stefanescu | William Baggett | Brent Morgan
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

We describe the DARE corpus, an annotated data set focusing on pronoun resolution in tutorial dialogue. Although data sets for general purpose anaphora resolution exist, they are not suitable for dialogue based Intelligent Tutoring Systems. To the best of our knowledge, no data set is currently available for pronoun resolution in dialogue based intelligent tutoring systems. The described DARE corpus consists of 1,000 annotated pronoun instances collected from conversations between high-school students and the intelligent tutoring system DeepTutor. The data set is publicly available.

pdf abs
Latent Semantic Analysis Models on Wikipedia and TASA
Dan Ștefănescu | Rajendra Banjade | Vasile Rus
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

This paper introduces a collection of freely available Latent Semantic Analysis models built on the entire English Wikipedia and the TASA corpus. The models differ not only on their source, Wikipedia versus TASA, but also on the linguistic items they focus on: all words, content-words, nouns-verbs, and main concepts. Generating such models from large datasets (e.g. Wikipedia), that can provide a large coverage for the actual vocabulary in use, is computationally challenging, which is the reason why large LSA models are rarely available. Our experiments show that for the task of word-to-word similarity, the scores assigned by these models are strongly correlated with human judgment, outperforming many other frequently used measures, and comparable to the state of the art.

2013

pdf
Towards a Structured Representation of Generic Concepts and Relations in Large Text Corpora
Archana Bhattarai | Vasile Rus
Proceedings of the International Conference Recent Advances in Natural Language Processing RANLP 2013

pdf
SEMILAR: The Semantic Similarity Toolkit
Vasile Rus | Mihai Lintean | Rajendra Banjade | Nobal Niraula | Dan Stefanescu
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations

2012

pdf
A Comparison of Greedy and Optimal Assessment of Natural Language Student Input Using Word-to-Word Similarity Metrics
Vasile Rus | Mihai Lintean
Proceedings of the Seventh Workshop on Building Educational Applications Using NLP

2011

2010

2006

pdf abs
The Look and Feel of a Confident Entailer
Vasile Rus | Art Graesser
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

The paper presents a software system that embodies a lexico-syntactic approach to the task of Textual Entailment. Although the approach is based on a minimal set of resources it is highly confident. The architecture of the system is open and can be easily expanded with more and deeper processing modules. Results on a standard data set are presented.

Co-authors

Venues