2025
pdf
bib
abs
An LLM-based Temporal-spatial Data Generation and Fusion Approach for Early Detection of Late Onset Alzheimer’s Disease (LOAD) Stagings Especially in Chinese and English-speaking Populations
Yang Han
|
Jacqueline C.k. Lam
|
Victor O.k. Li
|
Lawrence Y. L. Cheung
Findings of the Association for Computational Linguistics: EMNLP 2025
Alzheimer’s Disease (AD), the 7th leading cause of death globally, demands scalable methods for early detection. While speech-based diagnostics offer promise, existing approaches struggle with temporal-spatial (T-S) challenges in capturing subtle linguistic shifts across different disease stages (temporal) and in adapting to cross-linguistic variability (spatial). This study introduces a novel Large Language Model (LLM)-driven T-S fusion framework that integrates multilingual LLMs, contrastive learning, and interpretable marker discovery to revolutionize Late Onset AD (LOAD) detection. Our key innovations include: (1) T-S Data Imputation: Leveraging LLMs to generate synthetic speech transcripts across different LOAD stages (NC, Normal Control; eMCI, early Mild Cognitive Impairment; lMCI, late Mild Cognitive Impairment; AD) and languages (Chinese, English, Spanish), addressing data scarcity while preserving clinical relevance (expert validation: 86% agreement with LLM-generated labels). (2) T-S Transformer with Contrastive Learning: A multilingual model that disentangles stage-specific (temporal) and language-specific (spatial) patterns, achieving a notable improvement of 10.9–24.7% in F1-score over existing baselines. (3) Cross-Linguistic Marker Discovery: Identifying language-agnostic markers and language-specific patterns to enhance interpretability for clinical adoption. By unifying temporal LOAD stages and spatial diversity, our framework achieves state-of-the-art performance in early LOAD detection while enabling cross-linguistic diagnostics. This study bridges NLP and clinical neuroscience, demonstrating LLMs’ potential to amplify limited biomedical data and advance equitable healthcare AI.
2010
pdf
bib
Tree Topological Features for Unlexicalized Parsing
Samuel W. K. Chan
|
Lawrence Y. L. Cheung
|
Mickey W. C. Chong
Coling 2010: Posters
2006
pdf
bib
abs
Court Stenography-To-Text (“STT”) in Hong Kong: A Jurilinguistic Engineering Effort
Benjamin K. Tsou
|
Tom B.Y. Lai
|
K.K. Sin
|
Lawrence Y.L. Cheung
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
Implementation of legal bilingualism in Hong Kong after 1997 has necessitated the production of voluminous and extensive court proceedings and judgments in both Chinese and English. For the former, Cantonese, a dialect of Chinese, is the home language of more than 90% of the population in Hong Kong and so used in the courts. To record speech in Cantonese verbatim, a Chinese Computer-Aided Transcription system has been developed. The transcription system converts stenographic codes into Chinese text, i.e. from phonetic to orthographic representation of the language. The main challenge lies in the resolution of the sever ambiguity resulting from homocode problems in the conversion process. Cantonese Chinese is typified by problematic homonymy, which presents serious challenges. The N-gram statistical model is employed to estimate the most probable character string of the input transcription codes. Domain-specific corpora have been compiled to support the statistical computation. To improve accuracy, scalable techniques such as domain-specific transcription and special encoding are used. Put together, these techniques deliver 96% transcription accuracy.
2002
pdf
bib
Alignment and Extraction of Bilingual Legal Terminology from Context Profiles
Oi Yee Kwong
|
Benjamin K. Tsou
|
Tom B.Y. Lai
|
Robert W.P. Luk
|
Lawrence Y.L. Cheung
|
Francis C.Y. Chik
COLING-02: COMPUTERM 2002: Second International Workshop on Computational Terminology
2000
pdf
bib
Jurilinguistic Engineering in Cantonese Chinese: An N-gram-based Speech to Text Transcription System
B. K. T’sou
|
K. K. Sin
|
S. W. K. Chan
|
T. B. Y. Lai
|
C Lun
|
K. T. Ko
|
G. K. K. Chan
|
L. Y. L. Cheung
COLING 2000 Volume 2: The 18th International Conference on Computational Linguistics
pdf
bib
Automatic Conversion from Phonetic to Textual Representation of Cantonese : The Case of Hong Kong Court Proceedings
Benjamin K. Tsou
|
K.K. Sin
|
Samuel W. K. Chan
|
Tom B. Y. Lai
|
Caesar Lun
|
K. T. Ko
|
Gary K. K. Chan
|
Lawrence Y. L. Cheung
Proceedings of the 14th Pacific Asia Conference on Language, Information and Computation