2025
pdf
bib
LoveHeaven at MAHED 2025: Text-based Hate and Hope Speech Classification Using AraBERT-Twitter Ensemble
Nguyễn Thiên Bảo
|
Dang Van Thin
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks
pdf
bib
Baoflowin502 at MAHED Shared Task: Text-based Hate and Hope Speech Classification
Nguyen Minh Bao
|
Dang Van Thin
Proceedings of The Third Arabic Natural Language Processing Conference: Shared Tasks
pdf
bib
abs
Automotive Document Labeling Using Large Language Models
Dang Van Thin
|
Cuong Xuan Chu
|
Christian Graf
|
Tobias Kaminski
|
Trung-Kien Tran
Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track
Repairing and maintaining car parts are crucial tasks in the automotive industry, requiring a mechanic to have all relevant technical documents available. However, retrieving the right documents from a huge database heavily depends on domain expertise and is time consuming and error-prone. By labeling available documents according to the components they relate to, concise and accurate information can be retrieved efficiently. However, this is a challenging task as the relevance of a document to a particular component strongly depends on the context and the expertise of the domain specialist. Moreover, component terminology varies widely between different manufacturers. We address these challenges by utilizing Large Language Models (LLMs) to enrich and unify a component database via web mining, extracting relevant keywords, and leveraging hybrid search and LLM-based re-ranking to select the most relevant component for a document. We systematically evaluate our method using various LLMs on an expert-annotated dataset and demonstrate that it outperforms the baselines, which rely solely on LLM prompting.
pdf
bib
abs
twinhter at LeWiDi-2025: Integrating Annotator Perspectives into BERT for Learning with Disagreements
Nguyen Huu Dang Nguyen
|
Dang Van Thin
Proceedings of the The 4th Workshop on Perspectivist Approaches to NLP
Annotator-provided information during labeling can reflect differences in how texts are understood and interpreted, though such variation may also arise from inconsistencies or errors. To make use of this information, we build a BERT-based model that integrates annotator perspectives and evaluate it on four datasets from the third edition of the Learning With Disagreements (LeWiDi) shared task. For each original data point, we create a new (text, annotator) pair, optionally modifying the text to reflect the annotator’s perspective when additional information is available. The text and annotator features are embedded separately and concatenated before classification, enabling the model to capture individual interpretations of the same input. Our model achieves first place on both tasks for the Par and VariErrNLI datasets. More broadly, it performs very well on datasets where annotators provide rich information and the number of annotators is relatively small, while still maintaining competitive results on datasets with limited annotator information and a larger annotator pool.
2024
pdf
bib
Prompt Engineering with Large Language Models for Vietnamese Sentiment Classification
Dang Van Thin
|
Duong Ngoc Hao
|
Ngan Luu-Thuy Nguyen
Proceedings of the 38th Pacific Asia Conference on Language, Information and Computation
pdf
bib
abs
NRK at SemEval-2024 Task 1: Semantic Textual Relatedness through Domain Adaptation and Ensemble Learning on BERT-based models
Nguyen Tuan Kiet
|
Dang Van Thin
Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)
This paper describes the system of the team NRK for Task A in the SemEval-2024 Task 1: Semantic Textual Relatedness (STR). We focus on exploring the performance of ensemble architectures based on the voting technique and different pre-trained transformer-based language models, including the multilingual and monolingual BERTology models. The experimental results show that our system has achieved competitive performance in some languages in Track A: Supervised, where our submissions rank in the Top 3 and Top 4 for Algerian Arabic and Amharic languages. Our source code is released on the GitHub site.