2024
pdf
abs
MARIO: MAth Reasoning with code Interpreter Output - A Reproducible Pipeline
Minpeng Liao
|
Chengxi Li
|
Wei Luo
|
Wu Jing
|
Kai Fan
Findings of the Association for Computational Linguistics ACL 2024
Large language models (LLMs) have significantly improved in understanding natural language but still lack in mathematical reasoning, a hurdle on the path to true artificial general intelligence. The training of large language models, based on next-token prediction, struggles to capture the precise nature of mathematical reasoning, presenting both practical and theoretical challenges. In this paper, we address this challenge by enriching the data landscape and introducing a reasonable data format, enhanced the text analysis of the LLM with a capability to utilize a Python code interpreter. This dataset is derived from GSM8K and MATH and has been further refined through a combination of GPT annotations, human review, and self-training processes. Additionally, we propose a tentative, easily replicable protocol for the fine-tuning of math-specific LLMs, which has led to a significant improvement in the performance of a 7B-parameter LLM on the GSM8K and MATH datasets. A solution generator and a value estimator are fine-tuned simultaneously in a multi-task fashion, while an outlier-free value model-based inference method is proposed to further boost the performance. We are committed to advancing the field of mathematical reasoning in LLMs and, to that end, we will make the source code and checkpoints publicly available.
pdf
abs
Beyond Linguistic Cues: Fine-grained Conversational Emotion Recognition via Belief-Desire Modelling
Bo Xu
|
Longjiao Li
|
Wei Luo
|
Mehdi Naseriparsa
|
Zhehuan Zhao
|
Hongfei Lin
|
Feng Xia
Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Emotion recognition in conversation (ERC) is essential for dialogue systems to identify the emotions expressed by speakers. Although previous studies have made significant progress, accurate recognition and interpretation of similar fine-grained emotion properly accounting for individual variability remains a challenge. One particular under-explored area is the role of individual beliefs and desires in modelling emotion. Inspired by the Belief-Desire Theory of Emotion, we propose a novel method for conversational emotion recognition that incorporates both belief and desire to accurately identify emotions. We extract emotion-eliciting events from utterances and construct graphs that represent beliefs and desires in conversations. By applying message passing between nodes, our graph effectively models the utterance context, speaker’s global state, and the interaction between emotional beliefs, desires, and utterances. We evaluate our model’s performance by conducting extensive experiments on four popular ERC datasets and comparing it with multiple state-of-the-art models. The experimental results demonstrate the superiority of our proposed model and validate the effectiveness of each module in the model.
2023
pdf
abs
Adaptive Policy with Wait-k Model for Simultaneous Translation
Libo Zhao
|
Kai Fan
|
Wei Luo
|
Wu Jing
|
Shushu Wang
|
Ziqian Zeng
|
Zhongqiang Huang
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Simultaneous machine translation (SiMT) requires a robust read/write policy in conjunction with a high-quality translation model. Traditional methods rely on either a fixed wait-k policy coupled with a standalone wait-k translation model, or an adaptive policy jointly trained with the translation model. In this study, we propose a more flexible approach by decoupling the adaptive policy model from the translation model. Our motivation stems from the observation that a standalone multi-path wait-k model performs competitively with adaptive policies utilized in state-of-the-art SiMT approaches. Specifically, we introduce DaP, a divergence-based adaptive policy, that makes read/write decisions for any translation model based on the potential divergence in translation distributions resulting from future information. DaP extends a frozen wait-k model with lightweight parameters, and is both memory and computation efficient. Experimental results across various benchmarks demonstrate that our approach offers an improved trade-off between translation accuracy and latency, outperforming strong baselines.
pdf
abs
Better Simultaneous Translation with Monotonic Knowledge Distillation
Shushu Wang
|
Jing Wu
|
Kai Fan
|
Wei Luo
|
Jun Xiao
|
Zhongqiang Huang
Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Simultaneous machine translation (SiMT) presents a unique challenge as it requires generating target tokens before the source sentence is fully consumed. This can lead to the hallucination problem, where target tokens are generated without support from the source sentence. The prefix-to-prefix training data used to train SiMT models are not always parallel, due to divergent word order between the source and target languages, and can contribute to the problem. In this paper, we propose a novel approach that leverages traditional translation models as teachers and employs a two-stage beam search algorithm to generate monotonic yet accurate reference translations for sequence-level knowledge distillation. Experimental results demonstrate the significant improvements achieved by our approach over multiple strong SiMT baselines, leading to new state-of-the-art performance across various language pairs. Notably, when evaluated on a monotonic version of the WMT15 De-En test set, which includes references generated in a more monotonic style by professional translators, our approach achieves even more substantial improvement over the baselines. The source code and data are publicly available for further exploration.
2022
pdf
abs
Discrete Cross-Modal Alignment Enables Zero-Shot Speech Translation
Chen Wang
|
Yuchen Liu
|
Boxing Chen
|
Jiajun Zhang
|
Wei Luo
|
Zhongqiang Huang
|
Chengqing Zong
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing
End-to-end Speech Translation (ST) aims at translating the source language speech into target language text without generating the intermediate transcriptions. However, the training of end-to-end methods relies on parallel ST data, which are difficult and expensive to obtain. Fortunately, the supervised data for automatic speech recognition (ASR) and machine translation (MT) are usually more accessible, making zero-shot speech translation a potential direction. Existing zero-shot methods fail to align the two modalities of speech and text into a shared semantic space, resulting in much worse performance compared to the supervised ST methods. In order to enable zero-shot ST, we propose a novel Discrete Cross-Modal Alignment (DCMA) method that employs a shared discrete vocabulary space to accommodate and match both modalities of speech and text. Specifically, we introduce a vector quantization module to discretize the continuous representations of speech and text into a finite set of virtual tokens, and use ASR data to map corresponding speech and text to the same virtual token in a shared codebook. This way, source language speech can be embedded in the same semantic space as the source language text, which can be then transformed into target language text with an MT module. Experiments on multiple language pairs demonstrate that our zero-shot ST method significantly improves the SOTA, and even performers on par with the strong supervised ST baselines.
2021
pdf
abs
Mutual-Learning Improves End-to-End Speech Translation
Jiawei Zhao
|
Wei Luo
|
Boxing Chen
|
Andrew Gilman
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing
A currently popular research area in end-to-end speech translation is the use of knowledge distillation from a machine translation (MT) task to improve the speech translation (ST) task. However, such scenario obviously only allows one way transfer, which is limited by the performance of the teacher model. Therefore, We hypothesis that the knowledge distillation-based approaches are sub-optimal. In this paper, we propose an alternative–a trainable mutual-learning scenario, where the MT and the ST models are collaboratively trained and are considered as peers, rather than teacher/student. This allows us to improve the performance of end-to-end ST more effectively than with a teacher-student paradigm. As a side benefit, performance of the MT model also improves. Experimental results show that in our mutual-learning scenario, models can effectively utilise the auxiliary information from peer models and achieve compelling results on Must-C dataset.
2018
pdf
abs
IRCMS at SemEval-2018 Task 7 : Evaluating a basic CNN Method and Traditional Pipeline Method for Relation Classification
Zhongbo Yin
|
Zhunchen Luo
|
Wei Luo
|
Mao Bin
|
Changhai Tian
|
Yuming Ye
|
Shuai Wu
Proceedings of the 12th International Workshop on Semantic Evaluation
This paper presents our participation for sub-task1 (1.1 and 1.2) in SemEval 2018 task 7: Semantic Relation Extraction and Classification in Scientific Papers (Gábor et al., 2018). We experimented on this task with two methods: CNN method and traditional pipeline method. We use the context between two entities (included) as input information for both methods, which extremely reduce the noise effect. For the CNN method, we construct a simple convolution neural network to automatically learn features from raw texts without any manual processing. Moreover, we use the softmax function to classify the entity pair into a specific relation category. For the traditional pipeline method, we use the Hackabout method as a representation which is described in section3.5. The CNN method’s result is much better than traditional pipeline method (49.1% vs. 42.3% and 71.1% vs. 54.6% ).
2016
pdf
Speculation and Negation Scope Detection via Convolutional Neural Networks
Zhong Qian
|
Peifeng Li
|
Qiaoming Zhu
|
Guodong Zhou
|
Zhunchen Luo
|
Wei Luo
Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing
2010
pdf
The ICT statistical machine translation system for IWSLT 2010
Hao Xiong
|
Jun Xie
|
Hui Yu
|
Kai Liu
|
Wei Luo
|
Haitao Mi
|
Yang Liu
|
Yajuan Lü
|
Qun Liu
Proceedings of the 7th International Workshop on Spoken Language Translation: Evaluation Campaign
2002
pdf
Medstract: creating large-scale information servers from biomedical texts
James Pustejovsky
|
José Castaño
|
Roser Saurí
|
Jason Zhang
|
Wei Luo
Proceedings of the ACL-02 Workshop on Natural Language Processing in the Biomedical Domain