Dang Van Thin

Also published as: Dang Van Thin


2025

pdf bib
LoveHeaven at MAHED 2025: Text-based Hate and Hope Speech Classification Using AraBERT-Twitter Ensemble
Nguyễn Thiên Bảo | Dang Van Thin
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks

pdf bib
Baoflowin502 at MAHED Shared Task: Text-based Hate and Hope Speech Classification
Nguyen Minh Bao | Dang Van Thin
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks

pdf bib
Few-Shot Coreference Resolution with Semantic Difficulty Metrics and In-Context Learning
Nguyen Xuan Phuc | Dang Van Thin
Proceedings of the Eighth Workshop on Computational Models of Reference, Anaphora and Coreference

This paper presents our submission to the CRAC 2025 Shared Task on Multilingual Coreference Resolution in the LLM track. We propose a prompt-based few-shot coreference resolution system where the final inference is performed by Grok-3 using in-context learning. The core of our methodology is a difficulty- aware sample selection pipeline that leverages Gemini Flash 2.0 to compute semantic diffi- culty metrics, including mention dissimilarity and pronoun ambiguity. By identifying and selecting the most challenging training sam- ples for each language, we construct highly informative prompts to guide Grok-3 in predict- ing coreference chains and reconstructing zero anaphora. Our approach secured 3rd place in the CRAC 2025 shared task.

pdf bib
Automotive Document Labeling Using Large Language Models
Dang Van Thin | Cuong Xuan Chu | Christian Graf | Tobias Kaminski | Trung-Kien Tran
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track

Repairing and maintaining car parts are crucial tasks in the automotive industry, requiring a mechanic to have all relevant technical documents available. However, retrieving the right documents from a huge database heavily depends on domain expertise and is time consuming and error-prone. By labeling available documents according to the components they relate to, concise and accurate information can be retrieved efficiently. However, this is a challenging task as the relevance of a document to a particular component strongly depends on the context and the expertise of the domain specialist. Moreover, component terminology varies widely between different manufacturers. We address these challenges by utilizing Large Language Models (LLMs) to enrich and unify a component database via web mining, extracting relevant keywords, and leveraging hybrid search and LLM-based re-ranking to select the most relevant component for a document. We systematically evaluate our method using various LLMs on an expert-annotated dataset and demonstrate that it outperforms the baselines, which rely solely on LLM prompting.

pdf bib
Exploring the Power of Large Language Models for Vietnamese Implitcit Sentiment Analysis
Huy Gia Luu | Dang Van Thin
Proceedings of the 18th International Natural Language Generation Conference

We present the first benchmark for implicit sentiment analysis (ISA) in Vietnamese, aimed at evaluating large language models (LLMs) on their ability to interpret implicit sentiment accompanied by ViISA, a dataset specifically constructed for this task. We assess a variety of open-source and close-source LLMs using state-of-the-art (SOTA) prompting techniques. While LLMs achieve strong recall, they often misclassify implicit cues such as sarcasm and exaggeration, resulting in low precision. Through detailed error analysis, we highlight key challenges and suggest improvements to Chain-of-Thought prompting via more contextually aligned demonstrations.

pdf bib
twinhter at LeWiDi-2025: Integrating Annotator Perspectives into BERT for Learning with Disagreements
Nguyen Huu Dang Nguyen | Dang Van Thin
Proceedings of the The 4th Workshop on Perspectivist Approaches to NLP

Annotator-provided information during labeling can reflect differences in how texts are understood and interpreted, though such variation may also arise from inconsistencies or errors. To make use of this information, we build a BERT-based model that integrates annotator perspectives and evaluate it on four datasets from the third edition of the Learning With Disagreements (LeWiDi) shared task. For each original data point, we create a new (text, annotator) pair, optionally modifying the text to reflect the annotator’s perspective when additional information is available. The text and annotator features are embedded separately and concatenated before classification, enabling the model to capture individual interpretations of the same input. Our model achieves first place on both tasks for the Par and VariErrNLI datasets. More broadly, it performs very well on datasets where annotators provide rich information and the number of annotators is relatively small, while still maintaining competitive results on datasets with limited annotator information and a larger annotator pool.

pdf bib
Bosch@AI_Team at LegalSML 2025: Vietnamese Legal Small Language with Domain Adaptation and Aspect-based Data Synthesis
Tran Minh Quang | Nguyen Xuan Phi | Nguyen Van Tai | Phan Minh Toan | Dang Van Thin
Proceedings of the 11th International Workshop on Vietnamese Language and Speech Processing

pdf bib
UIT-NTTT at VLSP2025: A Prompt Engineering Approach for Date Arithmetic Reasoning in Vietnamese
Khoa Nguyen-Anh Le | Dang Van Thin
Proceedings of the 11th International Workshop on Vietnamese Language and Speech Processing

pdf bib
Bosch@AI_Team at MMT 2025: Medical Machine Translation by Bidirectional Training with Small Language Models
Phan Minh Toan | Nguyen Xuan Phi | Nguyen Van Tai | Trang Minh Quang | Dang Van Thin
Proceedings of the 11th International Workshop on Vietnamese Language and Speech Processing

pdf bib
Metamorphic at VLSP 2025: SIGMA – A Multimodal Agent System for Legal QA on Vietnamese Traffic Signs
Nguyen Tuan Kiet | Nguyen Khanh Tuan Anh | Long Hoang Huu Nguyen | Dam Vu Trong Tai | Dang Van Thin
Proceedings of the 11th International Workshop on Vietnamese Language and Speech Processing

2024

pdf bib
Prompt Engineering with Large Language Models for Vietnamese Sentiment Classification
Dang Van Thin | Duong Ngoc Hao | Ngan Luu-Thuy Nguyen
Proceedings of the 38th Pacific Asia Conference on Language, Information and Computation

pdf bib
NRK at SemEval-2024 Task 1: Semantic Textual Relatedness through Domain Adaptation and Ensemble Learning on BERT-based models
Nguyen Tuan Kiet | Dang Van Thin
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)

This paper describes the system of the team NRK for Task A in the SemEval-2024 Task 1: Semantic Textual Relatedness (STR). We focus on exploring the performance of ensemble architectures based on the voting technique and different pre-trained transformer-based language models, including the multilingual and monolingual BERTology models. The experimental results show that our system has achieved competitive performance in some languages in Track A: Supervised, where our submissions rank in the Top 3 and Top 4 for Algerian Arabic and Amharic languages. Our source code is released on the GitHub site.