Stephanie Hilary Xinyi Ma
2026
CoachLah: A Singlish–English Parallel Corpus of Health Coaching Conversations with Behavior Goal Annotations
Iva Bojic | Mathieu Ravaut | Stephanie Hilary Xinyi Ma | Doreen Tan | Andy Hau Yan Ho | Andy Khong
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Iva Bojic | Mathieu Ravaut | Stephanie Hilary Xinyi Ma | Doreen Tan | Andy Hau Yan Ho | Andy Khong
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Health coaching (HC) aims to promote sustainable behavior change through goal-oriented dialogue, but research in this area is limited by the scarcity of authentic, transcript-based corpora. Existing datasets are small, English-only, and Western-centric, overlooking cultural and linguistic factors that shape real-world HC interactions. We introduce CoachLah, the first Singlish–English parallel corpus of HC conversations collected from a randomized controlled trial in Singapore. The dataset comprises 36,852 utterances transcribed from almost 160 hours of recorded HC sessions with 51 clients and 4 professional health coaches. Each dialogue is speaker-labeled, transcribed in Singlish, and aligned with high-quality English translations to preserve linguistic and cultural nuances. All sessions include HC summaries written by health coaches after each HC session, from which behavioral goals were manually annotated. To demonstrate the dataset’s utility, we benchmark two downstream tasks: (i) Singlish-to-English translation using fine-tuned open-weight models (e.g., Gemma-2-9B-it) with Low-Rank Adaptation, and (ii) behavioral goal extraction from unstructured HC summaries using span-based modeling (e.g., DeBERTa-v3-base). Together, these contributions establish the first culturally grounded benchmark for low-resource, goal-oriented dialogue research in HC. Both the code and the dataset are available at: https://github.com/IvaBojic/CoachLah.
Singlish to English Translation with Precision: A Dataset and Language Detection-Driven Masked Modeling for Singlish to English Translation
Sujit Kumar | Gerome Kusuma Ang | Stephanie Hilary Xinyi Ma | Andy Hau Yan Ho | Andy Khong
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Sujit Kumar | Gerome Kusuma Ang | Stephanie Hilary Xinyi Ma | Andy Hau Yan Ho | Andy Khong
Proceedings of the Fifteenth Language Resources and Evaluation Conference
Singlish, a creole rooted in English and influenced by Singapore’s multilingual and multicultural environment, poses significant challenges for those proficient in standard English due to its unique and often complex lexical and syntactic structures. Despite significant advancements in language translation for both high- and low-resource languages, translating Singlish to English remains largely underexplored. This gap is primarily due to the lack of dedicated datasets for language detection and Singlish-to-English translation, as well as the absence of robust models capable of addressing the unique linguistic challenges posed by Singlish. In this work, we curate a word-level language detection dataset, a Singlish-to-English translation dataset, and propose a Language Detection-driven Masked Language Modelling approach for translating Singlish into English. We evaluate the performance of existing models and the proposed approach on two Singlish-to-English translation datasets, including our proposed SEAT dataset. The results demonstrate that the proposed LD-MLMTrans approach outperforms the baseline model and exhibits high proficiency in Singlish-to-English translation.
2025
SMARTMiner: Extracting and Evaluating SMART Goals from Low-Resource Health Coaching Notes
Iva Bojic | Qi Chwen Ong | Stephanie Hilary Xinyi Ma | Lin Ai | Zheng Liu | Ziwei Gong | Julia Hirschberg | Andy Hau Yan Ho | Andy W. H. Khong
Findings of the Association for Computational Linguistics: EMNLP 2025
Iva Bojic | Qi Chwen Ong | Stephanie Hilary Xinyi Ma | Lin Ai | Zheng Liu | Ziwei Gong | Julia Hirschberg | Andy Hau Yan Ho | Andy W. H. Khong
Findings of the Association for Computational Linguistics: EMNLP 2025
We present SMARTMiner, a framework for extracting and evaluating specific, measurable, attainable, relevant, time-bound (SMART) goals from unstructured health coaching (HC) notes. Developed in response to challenges observed during a clinical trial, the SMARTMiner achieves two tasks: (i) extracting behavior change goal spans and (ii) categorizing their SMARTness. We also introduce SMARTSpan, the first publicly available dataset of 173 HC notes annotated with 266 goals and SMART attributes. SMARTMiner incorporates an extractive goal retriever with a component-wise SMARTness classifier. Experiment results show that extractive models significantly outperformed their generative counterparts in low-resource settings, and that two-stage fine-tuning substantially boosted performance. The SMARTness classifier achieved up to 0.91 SMART F1 score, while the full SMARTMiner maintained high end-to-end accuracy. This work bridges healthcare, behavioral science, and natural language processing to support health coaches and clients with structured goal tracking - paving way for automated weekly goal reviews between human-led HC sessions. Both the code and the dataset are available at: https://github.com/IvaBojic/SMARTMiner.