Binh Nguyen


2023

pdf
HyperRouter: Towards Efficient Training and Inference of Sparse Mixture of Experts
Truong Giang Do | Le Khiem | Quang Pham | TrungTin Nguyen | Thanh-Nam Doan | Binh Nguyen | Chenghao Liu | Savitha Ramasamy | Xiaoli Li | Steven Hoi
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

By routing input tokens to only a few split experts, Sparse Mixture-of-Experts has enabled efficient training of large language models. Recent findings suggest that fixing the routers can achieve competitive performance by alleviating the collapsing problem, where all experts eventually learn similar representations. However, this strategy has two key limitations: (i) the policy derived from random routers might be sub-optimal, and (ii) it requires extensive resources during training and evaluation, leading to limited efficiency gains. This work introduces HyperRouter, which dynamically generates the router’s parameters through a fixed hypernetwork and trainable embeddings to achieve a balance between training the routers and freezing them to learn an improved routing policy. Extensive experiments across a wide range of tasks demonstrate the superior performance and efficiency gains of HyperRouter compared to existing routing methods. Our implementation is publicly available at https://github.com/giangdip2410/HyperRouter.

pdf
ViASR: A Novel Benchmark Dataset and Methods for Vietnamese Automatic Speech Recognition
Binh Nguyen | Son Huynh | Quoc Khanh Tran | An Le Tran-Hoai | Trong An Nguyen | Nguyen Tung Doan Tran | Thuy An Phan Thi | Le Thanh Nguyen | Hieu Nghia Nguyen | Dang Huynh
Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation

2022

pdf
Multi-level Community-awareness Graph Neural Networks for Neural Machine Translation
Binh Nguyen | Long Nguyen | Dien Dinh
Proceedings of the 29th International Conference on Computational Linguistics

Neural Machine Translation (NMT) aims to translate the source- to the target-language while preserving the original meaning. Linguistic information such as morphology, syntactic, and semantics shall be grasped in token embeddings to produce a high-quality translation. Recent works have leveraged the powerful Graph Neural Networks (GNNs) to encode such language knowledge into token embeddings. Specifically, they use a trained parser to construct semantic graphs given sentences and then apply GNNs. However, most semantic graphs are tree-shaped and too sparse for GNNs which cause the over-smoothing problem. To alleviate this problem, we propose a novel Multi-level Community-awareness Graph Neural Network (MC-GNN) layer to jointly model local and global relationships between words and their linguistic roles in multiple communities. Intuitively, the MC-GNN layer substitutes a self-attention layer at the encoder side of a transformer-based machine translation model. Extensive experiments on four language-pair datasets with common evaluation metrics show the remarkable improvements of our method while reducing the time complexity in very long sentences.