2025
pdf
bib
abs
Assessing Minimal Pairs of Chinese Verb-Resultative Complement Constructions: Insights from Language Models
Xinyao Huang
|
Yue Pan
|
Stefan Hartmann
|
Yang Yanning
Proceedings of the Second International Workshop on Construction Grammars and NLP
Chinese verb-resultative complement construction (VRCC), constitute a distinctive syntactic-semantic pattern in Chinese that integrates agent-patient dynamics with real-world state changes; yet widely used benchmarks such as CLiMP and ZhoBLiMP provide few minimal-pair probes tailored to these constructions. We introduce ZhVrcMP, a 1,204 pair dataset spanning two paradigms: resultative complement presence versus absence, and verb–complement order. The examples are drawn from Modern Chinese and are annotated for linguistic validity. Using mean log probability scoring, we evaluate Zh-Pythia models (14M-1.4B) and Mistral-7B-Instruct-v0.3. Larger Zh-Pythia models perform strongly, especially on the order paradigm, reaching 89.87% accuracy. Mistral-7B-Instruct-v0.3 shows lower perplexity yet overall weaker accuracy, underscoring the remaining difficulty of modeling constructional semantics in Chinese.
2023
pdf
bib
abs
Large Scale Sequence-to-Sequence Models for Clinical Note Generation from Patient-Doctor Conversations
Gagandeep Singh
|
Yue Pan
|
Jesus Andres-Ferrer
|
Miguel Del-Agua
|
Frank Diehl
|
Joel Pinto
|
Paul Vozila
Proceedings of the 5th Clinical Natural Language Processing Workshop
We present our work on building large scale sequence-to-sequence models for generating clinical note from patient-doctor conversation. This is formulated as an abstractive summarization task for which we use encoder-decoder transformer model with pointer-generator. We discuss various modeling enhancements to this baseline model which include using subword and multiword tokenization scheme, prefixing the targets with a chain-of-clinical-facts, and training with contrastive loss that is defined over various candidate summaries. We also use flash attention during training and query chunked attention during inference to be able to process long input and output sequences and to improve computational efficiency. Experiments are conducted on a dataset containing about 900K encounters from around 1800 healthcare providers covering 27 specialties. The results are broken down into primary care and non-primary care specialties. Consistent accuracy improvements are observed across both of these categories.
2021
pdf
bib
A Comparative Study of Collocation Extraction Methods from the Perspectives of Vocabulary and Grammar: A Case Study in the Field of Journalism
Lulu Gu
|
Yue Pan
|
Pengyuan Liu
Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation
2020
pdf
bib
abs
Generating Medical Reports from Patient-Doctor Conversations Using Sequence-to-Sequence Models
Seppo Enarvi
|
Marilisa Amoia
|
Miguel Del-Agua Teba
|
Brian Delaney
|
Frank Diehl
|
Stefan Hahn
|
Kristina Harris
|
Liam McGrath
|
Yue Pan
|
Joel Pinto
|
Luca Rubini
|
Miguel Ruiz
|
Gagandeep Singh
|
Fabian Stemmer
|
Weiyi Sun
|
Paul Vozila
|
Thomas Lin
|
Ranjani Ramamurthy
Proceedings of the First Workshop on Natural Language Processing for Medical Conversations
We discuss automatic creation of medical reports from ASR-generated patient-doctor conversational transcripts using an end-to-end neural summarization approach. We explore both recurrent neural network (RNN) and Transformer-based sequence-to-sequence architectures for summarizing medical conversations. We have incorporated enhancements to these architectures, such as the pointer-generator network that facilitates copying parts of the conversations to the reports, and a hierarchical RNN encoder that makes RNN training three times faster with long inputs. A comparison of the relative improvements from the different model architectures over an oracle extractive baseline is provided on a dataset of 800k orthopedic encounters. Consistent with observations in literature for machine translation and related tasks, we find the Transformer models outperform RNN in accuracy, while taking less than half the time to train. Significantly large wins over a strong oracle baseline indicate that sequence-to-sequence modeling is a promising approach for automatic generation of medical reports, in the presence of data at scale.
2005
pdf
bib
Improved-Edit-Distance Kernel for Chinese Relation Extraction
Wanxiang Che
|
Jianmin Jiang
|
Zhong Su
|
Yue Pan
|
Ting Liu
Companion Volume to the Proceedings of Conference including Posters/Demos and tutorial abstracts
2001
pdf
bib
Advances in meeting recognition
Alex Waibel
|
Hua Yu
|
Tanja Schultz
|
Yue Pan
|
Michael Bett
|
Martin Westphal
|
Hagen Soltau
|
Thomas Schaaf
|
Florian Metze
Proceedings of the First International Conference on Human Language Technology Research