2024
pdf
abs
Efficient and Interpretable Information Retrieval for Product Question Answering with Heterogeneous Data
Biplob Biswas
|
Rajiv Ramnath
Proceedings of the Seventh Workshop on e-Commerce and NLP @ LREC-COLING 2024
Expansion-enhanced sparse lexical representation improves information retrieval (IR) by minimizing vocabulary mismatch problems during lexical matching. In this paper, we explore the potential of jointly learning dense semantic representation and combining it with the lexical one for ranking candidate information. We present a hybrid information retrieval mechanism that maximizes lexical and semantic matching while minimizing their shortcomings. Our architecture consists of dual hybrid encoders that independently encode queries and information elements. Each encoder jointly learns a dense semantic representation and a sparse lexical representation augmented by a learnable term expansion of the corresponding text through contrastive learning. We demonstrate the efficacy of our model in single-stage ranking of a benchmark product question-answering dataset containing the typical heterogeneous information available on online product pages. Our evaluation demonstrates that our hybrid approach outperforms independently trained retrievers by 10.95% (sparse) and 2.7% (dense) in MRR@5 score. Moreover, our model offers better interpretability and performs comparably to state-of-the-art cross-encoders while reducing response time by 30% (latency) and cutting computational load by approximately 38% (FLOPs).
2022
pdf
abs
Retrieval Based Response Letter Generation For a Customer Care Setting
Biplob Biswas
|
Renhao Cui
|
Rajiv Ramnath
Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies: Industry Track
Letter-like communications (such as email) are a major means of customer relationship management within customer-facing organizations. These communications are initiated on a channel by requests from customers and then responded to by the organization on the same channel. For decades, the job has almost entirely been conducted by human agents who attempt to provide the most appropriate reaction to the request. Rules have been made to standardize the overall customer service process and make sure the customers receive professional responses. Recent progress in natural language processing has made it possible to automate response generation. However, the diversity and open nature of customer queries and the lack of structured knowledge bases make this task even more challenging than typical task-oriented language generation tasks. Keeping those obstacles in mind, we propose a deep-learning based response letter generation framework that attempts to retrieve knowledge from historical responses and utilize it to generate an appropriate reply. Our model uses data augmentation to address the insufficiency of query-response pairs and employs a ranking mechanism to choose the best response from multiple potential options. We show that our technique outperforms the baselines by significant margins while producing consistent and informative responses.
2020
pdf
bib
abs
Sequence-to-Set Semantic Tagging for Complex Query Reformulation and Automated Text Categorization in Biomedical IR using Self-Attention
Manirupa Das
|
Juanxi Li
|
Eric Fosler-Lussier
|
Simon Lin
|
Steve Rust
|
Yungui Huang
|
Rajiv Ramnath
Proceedings of the 19th SIGBioMed Workshop on Biomedical Language Processing
Novel contexts, comprising a set of terms referring to one or more concepts, may often arise in complex querying scenarios such as in evidence-based medicine (EBM) involving biomedical literature. These may not explicitly refer to entities or canonical concept forms occurring in a fact-based knowledge source, e.g. the UMLS ontology. Moreover, hidden associations between related concepts meaningful in the current context, may not exist within a single document, but across documents in the collection. Predicting semantic concept tags of documents can therefore serve to associate documents related in unseen contexts, or categorize them, in information filtering or retrieval scenarios. Thus, inspired by the success of sequence-to-sequence neural models, we develop a novel sequence-to-set framework with attention, for learning document representations in a unique unsupervised setting, using no human-annotated document labels or external knowledge resources and only corpus-derived term statistics to drive the training, that can effect term transfer within a corpus for semantically tagging a large collection of documents. Our sequence-to-set modeling approach to predict semantic tags, gives to the best of our knowledge, the state-of-the-art for both, an unsupervised query expansion (QE) task for the TREC CDS 2016 challenge dataset when evaluated on an Okapi BM25–based document retrieval system; and also over the MLTM system baseline baseline (Soleimani and Miller, 2016), for both supervised and semi-supervised multi-label prediction tasks on the del.icio.us and Ohsumed datasets. We make our code and data publicly available.
2018
pdf
abs
Phrase2VecGLM: Neural generalized language model–based semantic tagging for complex query reformulation in medical IR
Manirupa Das
|
Eric Fosler-Lussier
|
Simon Lin
|
Soheil Moosavinasab
|
David Chen
|
Steve Rust
|
Yungui Huang
|
Rajiv Ramnath
Proceedings of the BioNLP 2018 workshop
In this work, we develop a novel, completely unsupervised, neural language model-based document ranking approach to semantic tagging of documents, using the document to be tagged as a query into the GLM to retrieve candidate phrases from top-ranked related documents, thus associating every document with novel related concepts extracted from the text. For this we extend the word embedding-based general language model due to Ganguly et al 2015, to employ phrasal embeddings, and use the semantic tags thus obtained for downstream query expansion, both directly and in feedback loop settings. Our method, evaluated using the TREC 2016 clinical decision support challenge dataset, shows statistically significant improvement not only over various baselines that use standard MeSH terms and UMLS concepts for query expansion, but also over baselines using human expert–assigned concept tags for the queries, run on top of a standard Okapi BM25–based document retrieval system.