This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
LinqingChen
Also published as:
林卿 陈
Fixing paper assignments
Please select all papers that do not belong to this person.
Indicate below which author they should be assigned to.
In this paper, we propose a novel system that integrates state-of-the-art, domain-specific large language models with advanced information retrieval techniques to deliver comprehensive and context-aware responses. Our approach facilitates seamless interaction among diverse components, enabling cross-validation of outputs to produce accurate, high-quality responses enriched with relevant data, images, tables, and other modalities. We demonstrate the system’s capability to enhance response precision by leveraging a robust question-answering model, significantly improving the quality of dialogue generation.The system provides an accessible platform for real-time, high-fidelity interactions, allowing users to benefit from efficient human-computer interaction, precise retrieval, and simultaneous access to a wide range of literature and data. This dramatically improves the research efficiency of professionals in the biomedical and pharmaceutical domains and facilitates faster, more informed decision-making throughout the R&D process. Furthermore, the system proposed in this paper is available at https://synapse-chat.patsnap.com.
Recent development in Retrieval-Augmented Large Language Models (LLMs) have shown great promise in biomedical applications. However, a critical gap persists in reliably evaluating their curation ability—the process by which models select and integrate relevant references while filtering out noise. To address this, we introduce the benchmark for Curation of Retrieval-Augmented LLMs in Biomedicine (CRAB), the first multilingual benchmark tailored for evaluating the biomedical curation of retrieval-augmented LLMs, available in English, French, German and Chinese. By incorporating a novel citation-based evaluation metric, CRAB quantifies the curation performance of retrieval-augmented LLMs in biomedicine. Experimental results reveal significant discrepancies in the curation performance of mainstream LLMs, underscoring the urgent need to improve it in the domain of biomedicine.
“In the field of Natural Language Processing (NLP), Large-scale Language Models (LLMs) havedemonstrated exceptional capabilities across a variety of tasks, including question answering,classification, and particularly, natural language understanding. The integration of neural ma-chine translation with LLMs presents significant potential, transforming the paradigms of cross-lingual communication and information exchange. This study investigates the foundational as-pects of LLMs’ translation abilities and identifies effective training methodologies to equip themwith multilingual capacities. We specifically explore the optimal timing for introducing trans-lation capabilities to LLMs via supervised tasks, considering the inherent bilingual nature ofmachine translation. Key questions explored include whether it is more beneficial to integratemultiple languages during the pre-training or supervised fine-tuning (SFT) stages, how varia-tions in language ratios influence LLMs’ translation abilities, and whether longer or shorter textsare more effective for training these models. This research conducts a thorough investigationby training multiple LLMs from scratch with parameter scales in the billions and enhances therobustness of our findings by upgrading the language capabilities of pre-trained open-sourcemodels with parameter scales reaching tens of billions. The aim is to provide a detailed analysisthat elucidates the complexities of augmenting machine translation capabilities within LLMs.”