Tomohiro Nishiyama
2026
Overview of the 11th Social Media Mining for Health (#SMM4H) and Health Real-World Data (HeaRD) Shared Tasks at ACL 2026
Guillermo Lopez-Garcia | Jose Miguel Acitores Cortina | Jacob Berkowitz | Joey Chan | Sumon Kanti Dey | Ivan Flores Amaro | Fernando Gallego | Lauren Gryboski | Ari Z. Klein | Farnoush Zeidi Kolehparcheh | Martin Krallinger | Salvador Lima-Lopez | Yujun Ma | Tomohiro Nishiyama | Ahmad Rezaie Mianroodi | Amirali Rezaie Mianroodi | Lisa Raithel | Roland Roller | Judith Rosell | Frank Rudzicz | Abeed Sarker | Nicholas Tatonetti | Philippe Thomas | Elena Tutubalina | Dongfang Xu | Farnaz Zeidi | Yu Zhai | Pierre Zweigenbaum | Graciela Gonzalez-Hernandez
Proceedings of the 11th Social Media Mining for Health Research and Applications (SMM4H-HeaRD 2026) Workshop and Shared Tasks
Guillermo Lopez-Garcia | Jose Miguel Acitores Cortina | Jacob Berkowitz | Joey Chan | Sumon Kanti Dey | Ivan Flores Amaro | Fernando Gallego | Lauren Gryboski | Ari Z. Klein | Farnoush Zeidi Kolehparcheh | Martin Krallinger | Salvador Lima-Lopez | Yujun Ma | Tomohiro Nishiyama | Ahmad Rezaie Mianroodi | Amirali Rezaie Mianroodi | Lisa Raithel | Roland Roller | Judith Rosell | Frank Rudzicz | Abeed Sarker | Nicholas Tatonetti | Philippe Thomas | Elena Tutubalina | Dongfang Xu | Farnaz Zeidi | Yu Zhai | Pierre Zweigenbaum | Graciela Gonzalez-Hernandez
Proceedings of the 11th Social Media Mining for Health Research and Applications (SMM4H-HeaRD 2026) Workshop and Shared Tasks
The aim of the Social Media Mining for Health Applications and Health Real-World Data (#SMM4H-HeaRD) shared tasks is to fos- ter the development and evaluation of natural language processing, machine learning, and artificial intelligence methods for analyzing health-related text from social media and other real-world data sources. For the 11th iteration, held online and co-located with ACL 2026, the workshop continued the expanded #SMM4H- HeaRD platform initiated in 2025, broaden-ing its scope beyond social media to include additional health real-world data sources such as clinical narratives and biomedical literature. The 8 shared tasks covered diverse data sources, health domains (e.g., adverse drug events, insomnia, influenza vaccine effectiveness, cancer staging, substance use), and task formulations (e.g., classification, named entity recognition, span extraction, and text generation). In total, 110 teams registered, representing 31 countries. In this paper, we present an overview of the datasets, participant systems, and performance results, providing insights into current methods for mining social media and health real-world data for biomedical and clinical applications.
Exploring Novel Drug Research Area using Large Language Models Based on Research Trends in Biomedical Literature
Afnan Afnan | Michael Van Supranes | Tomohiro Nishiyama | Shoko Wakamiya | Eiji Aramaki
BioNLP 2026
Afnan Afnan | Michael Van Supranes | Tomohiro Nishiyama | Shoko Wakamiya | Eiji Aramaki
BioNLP 2026
The rapid expansion of biomedical literature makes manual identification of novel drug-disease relationships increasingly difficult. Existing approaches have leveraged LLMs to mine abstracts or construct knowledge graphs for drug repurposing. There are two key limitations: finite context windows for capturing macro-level research trends, and single-pass black-box pipelines make it difficult to verify outputs. This paper proposes a pipeline for discovering new drug targets by combining disease and drug research trends using Large Language Models (LLMs). Our method extracts PICO components from PubMed abstracts, normalizing the Population and Intervention Component to ICD and ATC codes, respectively. A temporal frequency delta matrix is constructed to capture publication count shifts across 2013 to 2022, then used to discover novel drug areas. Compared with the abstract-based baseline, our approach showed qualitative signs of generating combinations that were more closely aligned with observed research trends and, in some cases, more clinically plausible. These findings suggest the potential usefulness of structured trend information for LLM-based exploration, although the differences between the two methods were limited and the results remain preliminary. Future work will focus on validating the consistency and reliability of these candidates.
2025
ARxHYOKA at TAQEEM2025: Comparative Approaches to Arabic Essay Trait Scoring
Mohamad Alnajjar | Ahmad Almoustafa | Tomohiro Nishiyama | Shoko Wakamiya | Eiji Aramaki | Takuya Matsuzaki
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks
Mohamad Alnajjar | Ahmad Almoustafa | Tomohiro Nishiyama | Shoko Wakamiya | Eiji Aramaki | Takuya Matsuzaki
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks
2024
A Dataset for Pharmacovigilance in German, French, and Japanese: Annotating Adverse Drug Reactions across Languages
Lisa Raithel | Hui-Syuan Yeh | Shuntaro Yada | Cyril Grouin | Thomas Lavergne | Aurélie Névéol | Patrick Paroubek | Philippe Thomas | Tomohiro Nishiyama | Sebastian Möller | Eiji Aramaki | Yuji Matsumoto | Roland Roller | Pierre Zweigenbaum
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Lisa Raithel | Hui-Syuan Yeh | Shuntaro Yada | Cyril Grouin | Thomas Lavergne | Aurélie Névéol | Patrick Paroubek | Philippe Thomas | Tomohiro Nishiyama | Sebastian Möller | Eiji Aramaki | Yuji Matsumoto | Roland Roller | Pierre Zweigenbaum
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
User-generated data sources have gained significance in uncovering Adverse Drug Reactions (ADRs), with an increasing number of discussions occurring in the digital world. However, the existing clinical corpora predominantly revolve around scientific articles in English. This work presents a multilingual corpus of texts concerning ADRs gathered from diverse sources, including patient fora, social media, and clinical reports in German, French, and Japanese. Our corpus contains annotations covering 12 entity types, four attribute types, and 13 relation types. It contributes to the development of real-world multilingual language models for healthcare. We provide statistics to highlight certain challenges associated with the corpus and conduct preliminary experiments resulting in strong baselines for extracting entities and relations between these entities, both within and across languages.
Assessing Authenticity and Anonymity of Synthetic User-generated Content in the Medical Domain
Tomohiro Nishiyama | Lisa Raithel | Roland Roller | Pierre Zweigenbaum | Eiji Aramaki
Proceedings of the Workshop on Computational Approaches to Language Data Pseudonymization (CALD-pseudo 2024)
Tomohiro Nishiyama | Lisa Raithel | Roland Roller | Pierre Zweigenbaum | Eiji Aramaki
Proceedings of the Workshop on Computational Approaches to Language Data Pseudonymization (CALD-pseudo 2024)
Since medical text cannot be shared easily due to privacy concerns, synthetic data bears much potential for natural language processing applications. In the context of social media and user-generated messages about drug intake and adverse drug effects, this work presents different methods to examine the authenticity of synthetic text. We conclude that the generated tweets are untraceable and show enough authenticity from the medical point of view to be used as a replacement for a real Twitter corpus. However, original data might still be the preferred choice as they contain much more diversity.
Search
Fix author
Co-authors
- Eiji Aramaki 4
- Lisa Raithel 3
- Roland Roller 3
- Pierre Zweigenbaum 3
- Philippe Thomas 2
- Shoko Wakamiya 2
- Afnan Afnan 1
- Ahmad Almoustafa 1
- Mohamad Alnajjar 1
- Jacob Berkowitz 1
- Joey Chan 1
- Jose Cortina 1
- Sumon Kanti Dey 1
- Ivan Flores Amaro 1
- Fernando Gallego 1
- Graciela Gonzalez-Hernandez 1
- Cyril Grouin 1
- Lauren Gryboski 1
- Ari Z. Klein 1
- Martin Krallinger 1
- Thomas Lavergne 1
- Salvador Lima-Lopez 1
- Guillermo Lopez-Garcia 1
- Yujun Ma 1
- Yuji Matsumoto 1
- Takuya Matsuzaki 1
- Sebastian Möller 1
- Aurelie Neveol 1
- Patrick Paroubek 1
- Ahmad Rezaie Mianroodi 1
- Amirali Rezaie Mianroodi 1
- Judith Rosell 1
- Frank Rudzicz 1
- Abeed Sarker 1
- Nicholas Tatonetti 1
- Elena Tutubalina 1
- Michael Van Supranes 1
- Dongfang Xu 1
- Shuntaro Yada 1
- Hui-Syuan Yeh 1
- Farnaz Zeidi 1
- Farnoush Zeidi Kolehparcheh 1
- Yu Zhai 1