Ye Tian


2024

pdf
Improving LLM Generations via Fine-Grained Self-Endorsement
Ante Wang | Linfeng Song | Baolin Peng | Lifeng Jin | Ye Tian | Haitao Mi | Jinsong Su | Dong Yu
Findings of the Association for Computational Linguistics: ACL 2024

This work studies mitigating fact-conflicting hallucinations for large language model (LLM) at inference time.Particularly, we propose a self-endorsement framework that leverages the fine-grained fact-level comparisons across multiple sampled responses.Compared with prior ensemble methods (e.g., self-consistency) that perform response-level selection, our approach can better alleviate hallucinations for knowledge-intensive tasks.Our approach can broadly benefit smaller and open-source LLMs as it mainly conducts simple content-based comparisons.Experiments on Biographies show that our method can effectively improve the factuality of generations with simple and intuitive prompts across different scales of LLMs.Besides, comprehensive analyses on TriviaQA and GSM8K demonstrate the potential of self-endorsement for broader application.

pdf
Self-Consistency Boosts Calibration for Math Reasoning
Ante Wang | Linfeng Song | Ye Tian | Baolin Peng | Lifeng Jin | Haitao Mi | Jinsong Su | Dong Yu
Findings of the Association for Computational Linguistics: EMNLP 2024

pdf
Self-Alignment for Factuality: Mitigating Hallucinations in LLMs via Self-Evaluation
Xiaoying Zhang | Baolin Peng | Ye Tian | Jingyan Zhou | Lifeng Jin | Linfeng Song | Haitao Mi | Helen Meng
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Despite showing impressive abilities, large language models (LLMs) often struggle with factual inaccuracies, i.e., ”hallucinations”, even when they hold relevant knowledge. To mitigate these hallucinations, current approaches typically necessitate high-quality human factuality annotations. In this work, we explore Self-Alignment for Factuality, where we leverage the self-evaluation capability of an LLM to provide training signals that steer the model towards factuality. Specifically, we incorporate Self-Eval, a self-evaluation component, to prompt an LLM to validate the factuality of its own generated responses solely based on its internal knowledge. Additionally, we design Self-Knowledge Tuning (SK-Tuning) to augment the LLM’s self-evaluation ability by improving the model’s confidence estimation and calibration. We then utilize these self-annotated responses to fine-tune the model via Direct Preference Optimization algorithm. We show that the proposed self-alignment approach substantially enhances factual accuracy over Llama family models across three key knowledge-intensive tasks on TruthfulQA and BioGEN.

2023

pdf
Learning From Free-Text Human Feedback – Collect New Datasets Or Extend Existing Ones?
Dominic Petrak | Nafise Moosavi | Ye Tian | Nikolai Rozanov | Iryna Gurevych
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Continuous learning from free-text human feedback, such as error corrections, new knowledge, or alternative responses, is essential for today’s chatbots and virtual assistants to stay up-to-date, engaging, and socially acceptable. However, for research on methods for learning from such data, annotated data is scarce. To address this, we examine the error and user response types of six popular dialogue datasets from various types, including MultiWoZ, PersonaChat, Wizards-of-Wikipedia, and others, to assess their extendibility with the needed annotations. For this corpus study, we manually annotate a subset of each dataset with error and user response types using an improved version of the Integrated Error Taxonomy and a newly proposed user response type taxonomy. We provide the resulting dataset (EURTAD) to the community. Our findings provide new insights into dataset composition, including error types, user response types, and the relations between them.

pdf
How Are Idioms Processed Inside Transformer Language Models?
Ye Tian | Isobel James | Hye Son
Proceedings of the 12th Joint Conference on Lexical and Computational Semantics (*SEM 2023)

Idioms such as “call it a day” and “piece of cake,” are prevalent in natural language. How do Transformer language models process idioms? This study examines this question by analysing three models - BERT, Multilingual BERT, and DistilBERT. We compare the embeddings of idiomatic and literal expressions across all layers of the networks at both the sentence and word levels. Additionally, we investigate the attention directed from other sentence tokens towards a word within an idiom as opposed to in a literal context. Results indicate that while the three models exhibit slightly different internal mechanisms, they all represent idioms distinctively compared to literal language, with attention playing a critical role. These findings suggest that idioms are semantically and syntactically idiosyncratic, not only for humans but also for language models.

2022

pdf
Huawei BabelTar NMT at WMT22 Biomedical Translation Task: How We Further Improve Domain-specific NMT
Weixuan Wang | Xupeng Meng | Suqing Yan | Ye Tian | Wei Peng
Proceedings of the Seventh Conference on Machine Translation (WMT)

This paper describes Huawei Artificial Intelligence Application Research Center’s neural machine translation system (“BabelTar”). Our submission to the WMT22 biomedical translation shared task covers language directions between English and the other seven languages (French, German, Italian, Spanish, Portuguese, Russian, and Chinese). During the past four years, our participation in this domain-specific track has witnessed a paradigm shift of methodology from a purely data-driven focus to embracing diversified techniques, including pre-trained multilingual NMT models, homograph disambiguation, ensemble learning, and preprocessing methods. We illustrate practical insights and measured performance improvements relating to how we further improve our domain-specific NMT system.

2021

pdf
Cross-Lingual Transfer with MAML on Trees
Jezabel Garcia | Federica Freddi | Jamie McGowan | Tim Nieradzik | Feng-Ting Liao | Ye Tian | Da-shan Shiu | Alberto Bernacchia
Proceedings of the Second Workshop on Domain Adaptation for NLP

In meta-learning, the knowledge learned from previous tasks is transferred to new ones, but this transfer only works if tasks are related. Sharing information between unrelated tasks might hurt performance, and it is unclear how to transfer knowledge across tasks that have a hierarchical structure. Our research extends a meta-learning model, MAML, by exploiting hierarchical task relationships. Our algorithm, TreeMAML, adapts the model to each task with a few gradient steps, but the adaptation follows the hierarchical tree structure: in each step, gradients are pooled across tasks clusters and subsequent steps follow down the tree. We also implement a clustering algorithm that generates the tasks tree without previous knowledge of the task structure, allowing us to make use of implicit relationships between the tasks. We show that TreeMAML successfully trains natural language processing models for cross-lingual Natural Language Inference by taking advantage of the language phylogenetic tree. This result is useful since most languages in the world are under-resourced and the improvement on cross-lingual transfer allows the internationalization of NLP models.

pdf
How does BERT process disfluency?
Ye Tian | Tim Nieradzik | Sepehr Jalali | Da-shan Shiu
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

Natural conversations are filled with disfluencies. This study investigates if and how BERT understands disfluency with three experiments: (1) a behavioural study using a downstream task, (2) an analysis of sentence embeddings and (3) an analysis of the attention mechanism on disfluency. The behavioural study shows that without fine-tuning on disfluent data, BERT does not suffer significant performance loss when presented disfluent compared to fluent inputs (exp1). Analysis on sentence embeddings of disfluent and fluent sentence pairs reveals that the deeper the layer, the more similar their representation (exp2). This indicates that deep layers of BERT become relatively invariant to disfluency. We pinpoint attention as a potential mechanism that could explain this phenomenon (exp3). Overall, the study suggests that BERT has knowledge of disfluency structure. We emphasise the potential of using BERT to understand natural utterances without disfluency removal.

2020

pdf
Learning a Multi-Domain Curriculum for Neural Machine Translation
Wei Wang | Ye Tian | Jiquan Ngiam | Yinfei Yang | Isaac Caswell | Zarana Parekh
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Most data selection research in machine translation focuses on improving a single domain. We perform data selection for multiple domains at once. This is achieved by carefully introducing instance-level domain-relevance features and automatically constructing a training curriculum to gradually concentrate on multi-domain relevant and noise-reduced data batches. Both the choice of features and the use of curriculum are crucial for balancing and improving all domains, including out-of-domain. In large-scale experiments, the multi-domain curriculum simultaneously reaches or outperforms the individual performance and brings solid gains over no-curriculum training.

2018

pdf
Aggression Identification and Multi Lingual Word Embeddings
Thiago Galery | Efstathios Charitos | Ye Tian
Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018)

The system presented here took part in the 2018 Trolling, Aggression and Cyberbullying shared task (Forest and Trees team) and uses a Gated Recurrent Neural Network architecture (Cho et al., 2014) in an attempt to assess whether combining pre-trained English and Hindi fastText (Mikolov et al., 2018) word embeddings as a representation of the sequence input would improve classification performance. The motivation for this comes from the fact that the shared task data for English contained many Hindi tokens and therefore some users might be doing code-switching: the alternation between two or more languages in communication. To test this hypothesis, we also aligned Hindi and English vectors using pre-computed SVD matrices that pulls representations from different languages into a common space (Smith et al., 2017). Two conditions were tested: (i) one with standard pre-trained fastText word embeddings where each Hindi word is treated as an OOV token, and (ii) another where word embeddings for Hindi and English are loaded in a common vector space, so Hindi tokens can be assigned a meaningful representation. We submitted the second (i.e., multilingual) system and obtained the scores of 0.531 weighted F1 for the EN-FB dataset and 0.438 weighted F1 for the EN-TW dataset.

pdf
Treat the system like a human student: Automatic naturalness evaluation of generated text without reference texts
Isabel Groves | Ye Tian | Ioannis Douratsos
Proceedings of the 11th International Conference on Natural Language Generation

The current most popular method for automatic Natural Language Generation (NLG) evaluation is comparing generated text with human-written reference sentences using a metrics system, which has drawbacks around reliability and scalability. We draw inspiration from second language (L2) assessment and extract a set of linguistic features to predict human judgments of sentence naturalness. Our experiment using a small dataset showed that the feature-based approach yields promising results, with the added potential of providing interpretability into the source of the problems.

2017

pdf bib
Facebook sentiment: Reactions and Emojis
Ye Tian | Thiago Galery | Giulio Dulcinati | Emilia Molimpakis | Chao Sun
Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media

Emojis are used frequently in social media. A widely assumed view is that emojis express the emotional state of the user, which has led to research focusing on the expressiveness of emojis independent from the linguistic context. We argue that emojis and the linguistic texts can modify the meaning of each other. The overall communicated meaning is not a simple sum of the two channels. In order to study the meaning interplay, we need data indicating the overall sentiment of the entire message as well as the sentiment of the emojis stand-alone. We propose that Facebook Reactions are a good data source for such a purpose. FB reactions (e.g. “Love” and “Angry”) indicate the readers’ overall sentiment, against which we can investigate the types of emojis used the comments under different reaction profiles. We present a data set of 21,000 FB posts (57 million reactions and 8 million comments) from public media pages across four countries.

2016

pdf
When do we laugh?
Ye Tian | Chiara Mazzocconi | Jonathan Ginzburg
Proceedings of the 17th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf
DUEL: A Multi-lingual Multimodal Dialogue Corpus for Disfluency, Exclamations and Laughter
Julian Hough | Ye Tian | Laura de Ruiter | Simon Betz | Spyros Kousidis | David Schlangen | Jonathan Ginzburg
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

We present the DUEL corpus, consisting of 24 hours of natural, face-to-face, loosely task-directed dialogue in German, French and Mandarin Chinese. The corpus is uniquely positioned as a cross-linguistic, multimodal dialogue resource controlled for domain. DUEL includes audio, video and body tracking data and is transcribed and annotated for disfluency, laughter and exclamations.

2012

pdf
The CIPS-SIGHAN CLP 2012 ChineseWord Segmentation onMicroBlog Corpora Bakeoff
Huiming Duan | Zhifang Sui | Ye Tian | Wenjie Li
Proceedings of the Second CIPS-SIGHAN Joint Conference on Chinese Language Processing

pdf
Update Summarization using a Multi-level Hierarchical Dirichlet Process Model
Jiwei Li | Sujian Li | Xun Wang | Ye Tian | Baobao Chang
Proceedings of COLING 2012

pdf
Fine-Grained Classification of Named Entities by Fusing Multi-Features
Wenjie Li | Jiwei Li | Ye Tian | Zhifang Sui
Proceedings of COLING 2012: Posters

2009

pdf
A Novel Method of Sentence Ordering Based on Support Vector Machine
Gongfu Peng | Yanxiang He | Ye Tian | Yingsheng Tian | Weidong Wen
Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation, Volume 2