Junlin Jiang


2025

pdf bib
Implementing Retrieval Augmented Generation Technique on Unstructured and Structured Data Sources in a Call Center of a Large Financial Institution
Syed Shariyar Murtaza | Yifan Nie | Elias Avan | Utkarsh Soni | Wanyu Liao | Adam Carnegie | Cyril John Mathias | Junlin Jiang | Eugene Wen
Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: Industry Track)

The retrieval-augmented generation (RAG) technique enables generative AI models to extract accurate facts from external unstructureddata sources. For structured data, RAG is further augmented by function calls to query databases. This paper presents an industrialcase study that implements RAG in a large financial institution’s call center. The study showcases experiences and architecture for ascalable RAG deployment. It also introduces enhancements to RAG for retrieving facts from structured data sources using data embeddings, achieving low latency and high reliability. Our optimized production application demonstratesan average response time of only 7.33 seconds. Additionally, the paper compares various open-source and closed-source models for answer generation in an industrial context.

2020

pdf bib
Comparison of Machine Learning Methods for Multi-label Classification of Nursing Education and Licensure Exam Questions
John Langton | Krishna Srihasam | Junlin Jiang
Proceedings of the 3rd Clinical Natural Language Processing Workshop

In this paper, we evaluate several machine learning methods for multi-label classification of text questions. Every nursing student in the United States must pass the National Council Licensure Examination (NCLEX) to begin professional practice. NCLEX defines a number of competencies on which students are evaluated. By labeling test questions with NCLEX competencies, we can score students according to their performance in each competency. This information helps instructors measure how prepared students are for the NCLEX, as well as which competencies they may need help with. A key challenge is that questions may be related to more than one competency. Labeling questions with NCLEX competencies, therefore, equates to a multi-label, text classification problem where each competency is a label. Here we present an evaluation of several methods to support this use case along with a proposed approach. While our work is grounded in the nursing education domain, the methods described here can be used for any multi-label, text classification use case.