2025
pdf
bib
abs
Streamlining Biomedical Research with Specialized LLMs
Linqing Chen
Proceedings of the 31st International Conference on Computational Linguistics: System Demonstrations
In this paper, we propose a novel system that integrates state-of-the-art, domain-specific large language models with advanced information retrieval techniques to deliver comprehensive and context-aware responses. Our approach facilitates seamless interaction among diverse components, enabling cross-validation of outputs to produce accurate, high-quality responses enriched with relevant data, images, tables, and other modalities. We demonstrate the system’s capability to enhance response precision by leveraging a robust question-answering model, significantly improving the quality of dialogue generation.The system provides an accessible platform for real-time, high-fidelity interactions, allowing users to benefit from efficient human-computer interaction, precise retrieval, and simultaneous access to a wide range of literature and data. This dramatically improves the research efficiency of professionals in the biomedical and pharmaceutical domains and facilitates faster, more informed decision-making throughout the R&D process. Furthermore, the system proposed in this paper is available at https://synapse-chat.patsnap.com.
pdf
bib
abs
CRAB: A Benchmark for Evaluating Curation of Retrieval-Augmented LLMs in Biomedicine
Hanmeng Zhong
|
Linqing Chen
|
Wentao Wu
|
Weilei Wang
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Recent development in Retrieval-Augmented Large Language Models (LLMs) have shown great promise in biomedical applications. However, a critical gap persists in reliably evaluating their curation ability—the process by which models select and integrate relevant references while filtering out noise. To address this, we introduce the benchmark for Curation of Retrieval-Augmented LLMs in Biomedicine (CRAB), the first multilingual benchmark tailored for evaluating the biomedical curation of retrieval-augmented LLMs, available in English, French, German and Chinese. By incorporating a novel citation-based evaluation metric, CRAB quantifies the curation performance of retrieval-augmented LLMs in biomedicine. Experimental results reveal significant discrepancies in the curation performance of mainstream LLMs, underscoring the urgent need to improve it in the domain of biomedicine.
2024
pdf
bib
abs
E3: Optimizing Language Model Training for Translation via Enhancing Efficiency and Effectiveness
Linqing Chen
|
Weilei Wang
|
Dongyang Hu
Proceedings of the 23rd Chinese National Conference on Computational Linguistics (Volume 1: Main Conference)
“In the field of Natural Language Processing (NLP), Large-scale Language Models (LLMs) havedemonstrated exceptional capabilities across a variety of tasks, including question answering,classification, and particularly, natural language understanding. The integration of neural ma-chine translation with LLMs presents significant potential, transforming the paradigms of cross-lingual communication and information exchange. This study investigates the foundational as-pects of LLMs’ translation abilities and identifies effective training methodologies to equip themwith multilingual capacities. We specifically explore the optimal timing for introducing trans-lation capabilities to LLMs via supervised tasks, considering the inherent bilingual nature ofmachine translation. Key questions explored include whether it is more beneficial to integratemultiple languages during the pre-training or supervised fine-tuning (SFT) stages, how varia-tions in language ratios influence LLMs’ translation abilities, and whether longer or shorter textsare more effective for training these models. This research conducts a thorough investigationby training multiple LLMs from scratch with parameter scales in the billions and enhances therobustness of our findings by upgrading the language capabilities of pre-trained open-sourcemodels with parameter scales reaching tens of billions. The aim is to provide a detailed analysisthat elucidates the complexities of augmenting machine translation capabilities within LLMs.”
2022
pdf
bib
abs
生成模型在层次结构极限多标签文本分类中的应用(Generation Model for Hierarchical Extreme Multi-label Text Classification)
Linqing Chen (陈林卿)
|
Dawang He (何大望)
|
Yansi Xiao (肖燕思)
|
Yilin Liu (刘依林)
|
Jianping Lu (陆剑平)
|
Weilei Wang (王为磊)
Proceedings of the 21st Chinese National Conference on Computational Linguistics
“层次结构极限多标签文本分类是自然语言处理研究领域中一个重要而又具有挑战性的课题。该任务类别标签数量巨大且自成体系,标签与标签之间还具有不同层级间的依赖关系或同层次间的相关性,这些特性进一步增加了任务难度。该文提出将层次结构极限多标签文本分类任务视为序列转换问题,将输出标签视为序列,从而可以直接从数十万标签中生成与文本相关的类别标签。通过软约束机制和词表复合映射在解码过程中利用标签之间的层次结构与相关信息。实验结果表明,该文提出的方法与基线模型相比取得了有意义的性能提升。进一步分析表明,该方法不仅可以捕获利用不同层级标签之间的上下位关系,还对极限多标签体系自身携带的噪声具有一定容错能力。”
2020
pdf
bib
abs
层次化结构全局上下文增强的篇章级神经机器翻译(Hierarchical Global Context Augmented Document-level Neural Machine Translation)
Linqing Chen (陈林卿)
|
Junhui Li (李军辉)
|
Zhengxian Gong (贡正仙)
Proceedings of the 19th Chinese National Conference on Computational Linguistics
如何有效利用篇章上下文信息一直是篇章级神经机器翻译研究领域的一大挑战。本文提出利用来源于整个篇章的层次化全局上下文提高篇章级神经机器翻译性能。为了实现该目标,本文模型分别获取当前句内单词与篇章内所有句子及单词之间的依赖关系,结合不同层次的依赖关系以获取含有层次化篇章信息的全局上下文。最终源语言当前句子中的每个单词都能获取其独有的综合词和句级别依赖关系的上下文。为了充分利用平行句对语料在训练中的优势本文使用两步训练法,在句子级语料训练模型的基础上使用含有篇章信息的语料进行二次训练以获得捕获全局上下文的能力。在若干基准语料数据集上的实验表明本文提出的模型与若干强基准模型相比取得了有意义的翻译质量提升。实验进一步表明,结合层次化篇章信息的上下文比仅使用词级别上下文更具优势。除此之外,本文尝试通过不同方式将全局上下文与翻译模型结合并观察其对模型性能的影响,并初步探究篇章翻译中全局上下文在篇章中的分布情况。