Haizhou Li


2022

pdf
M3ED: Multi-modal Multi-scene Multi-label Emotional Dialogue Database
Jinming Zhao | Tenggan Zhang | Jingwen Hu | Yuchen Liu | Qin Jin | Xinchao Wang | Haizhou Li
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

The emotional state of a speaker can be influenced by many different factors in dialogues, such as dialogue scene, dialogue topic, and interlocutor stimulus. The currently available data resources to support such multimodal affective analysis in dialogues are however limited in scale and diversity. In this work, we propose a Multi-modal Multi-scene Multi-label Emotional Dialogue dataset, M3ED, which contains 990 dyadic emotional dialogues from 56 different TV series, a total of 9,082 turns and 24,449 utterances. M3ED is annotated with 7 emotion categories (happy, surprise, sad, disgust, anger, fear, and neutral) at utterance level, and encompasses acoustic, visual, and textual modalities. To the best of our knowledge, M3ED is the first multimodal emotional dialogue dataset in Chinese.It is valuable for cross-culture emotion analysis and recognition. We apply several state-of-the-art methods on the M3ED dataset to verify the validity and quality of the dataset. We also propose a general Multimodal Dialogue-aware Interaction framework, MDI, to model the dialogue context for emotion recognition, which achieves comparable performance to the state-of-the-art methods on the M3ED. The full dataset and codes are available.

pdf
Just Rank: Rethinking Evaluation with Word and Sentence Similarities
Bin Wang | C.-C. Jay Kuo | Haizhou Li
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Word and sentence embeddings are useful feature representations in natural language processing. However, intrinsic evaluation for embeddings lags far behind, and there has been no significant update since the past decade. Word and sentence similarity tasks have become the de facto evaluation method. It leads models to overfit to such evaluations, negatively impacting embedding models’ development. This paper first points out the problems using semantic similarity as the gold standard for word and sentence embedding evaluations. Further, we propose a new intrinsic evaluation method called EvalRank, which shows a much stronger correlation with downstream tasks. Extensive experiments are conducted based on 60+ models and popular datasets to certify our judgments. Finally, the practical evaluation toolkit is released for future benchmarking purposes.

pdf
FineD-Eval: Fine-grained Automatic Dialogue-Level Evaluation
Chen Zhang | Luis Fernando D’Haro | Qiquan Zhang | Thomas Friedrichs | Haizhou Li
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Recent model-based reference-free metrics for open-domain dialogue evaluation exhibit promising correlations with human judgment. However, they either perform turn-level evaluation or look at a single dialogue quality dimension. One would expect a good evaluation metric to assess multiple quality dimensions at the dialogue level. To this end, we are motivated to propose a multi-dimensional dialogue-level metric, which consists of three sub-metrics with each targeting a specific dimension. The sub-metrics are trained with novel self-supervised objectives and exhibit strong correlations with human judgment for their respective dimensions. Moreover, we explore two approaches to combine the sub-metrics: metric ensemble and multitask learning. Both approaches yield a holistic metric that significantly outperforms individual sub-metrics. Compared to the existing state-of-the-art metric, the combined metrics achieve around 16% relative improvement on average across three high-quality dialogue-level evaluation benchmarks.

pdf
Analyzing and Evaluating Faithfulness in Dialogue Summarization
Bin Wang | Chen Zhang | Yan Zhang | Yiming Chen | Haizhou Li
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Dialogue summarization is abstractive in nature, making it suffer from factual errors. The factual correctness of summaries has the highest priority before practical applications. Many efforts have been made to improve faithfulness in text summarization. However, there is a lack of systematic study on dialogue summarization systems. In this work, we first perform the fine-grained human analysis on the faithfulness of dialogue summaries and observe that over 35% of generated summaries are faithfully inconsistent respective the source dialogues. Furthermore, we present a new model-level faithfulness evaluation method. It examines generation models with multi-choice questions created by rule-based transformations. Experimental results show that our evaluation schema is a strong proxy for the factual correctness of summarization models. The human-annotated faithfulness samples and the evaluation toolkit are released to facilitate future research toward faithful dialogue summarization.

pdf
Generate, Discriminate and Contrast: A Semi-Supervised Sentence Representation Learning Framework
Yiming Chen | Yan Zhang | Bin Wang | Zuozhu Liu | Haizhou Li
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Most sentence embedding techniques heavily rely on expensive human-annotated sentence pairs as the supervised signals. Despite the use of large-scale unlabeled data, the performance of unsupervised methods typically lags far behind that of the supervised counterparts in most downstream tasks. In this work, we propose a semi-supervised sentence embedding framework, GenSE, that effectively leverages large-scale unlabeled data. Our method include three parts: 1) Generate: A generator/discriminator model is jointly trained to synthesize sentence pairs from open-domain unlabeled corpus; 2) Discriminate: Noisy sentence pairs are filtered out by the discriminator to acquire high-quality positive and negative sentence pairs; 3) Contrast: A prompt-based contrastive approach is presented for sentence representation learning with both annotated and synthesized data. Comprehensive experiments show that GenSE achieves an average correlation score of 85.19 on the STS datasets and consistent performance improvement on four domain adaptation tasks, significantly surpassing the state-of-the-art methods and convincingly corroborating its effectiveness and generalization ability.

2021

pdf
Revisiting Self-training for Few-shot Learning of Language Model
Yiming Chen | Yan Zhang | Chen Zhang | Grandee Lee | Ran Cheng | Haizhou Li
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

As unlabeled data carry rich task-relevant information, they are proven useful for few-shot learning of language model. The question is how to effectively make use of such data. In this work, we revisit the self-training technique for language model fine-tuning and present a state-of-the-art prompt-based few-shot learner, SFLM. Given two views of a text sample via weak and strong augmentation techniques, SFLM generates a pseudo label on the weakly augmented version. Then, the model predicts the same pseudo label when fine-tuned with the strongly augmented version. This simple approach is shown to outperform other state-of-the-art supervised and semi-supervised counterparts on six sentence classification and six sentence-pair classification benchmarking tasks. In addition, SFLM only relies on a few in-domain unlabeled data. We conduct a comprehensive analysis to demonstrate the robustness of our proposed approach under various settings, including augmentation techniques, model scale, and few-shot knowledge transfer across tasks.

pdf
Bootstrapped Unsupervised Sentence Representation Learning
Yan Zhang | Ruidan He | Zuozhu Liu | Lidong Bing | Haizhou Li
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

As high-quality labeled data is scarce, unsupervised sentence representation learning has attracted much attention. In this paper, we propose a new framework with a two-branch Siamese Network which maximizes the similarity between two augmented views of each sentence. Specifically, given one augmented view of the input sentence, the online network branch is trained by predicting the representation yielded by the target network of the same sentence under another augmented view. Meanwhile, the target network branch is bootstrapped with a moving average of the online network. The proposed method significantly outperforms other state-of-the-art unsupervised methods on semantic textual similarity (STS) and classification tasks. It can be adopted as a post-training procedure to boost the performance of the supervised methods. We further extend our method for learning multilingual sentence representations and demonstrate its effectiveness on cross-lingual STS tasks. Our code is available at https://github.com/yanzhangnlp/BSL.

pdf
DynaEval: Unifying Turn and Dialogue Level Evaluation
Chen Zhang | Yiming Chen | Luis Fernando D’Haro | Yan Zhang | Thomas Friedrichs | Grandee Lee | Haizhou Li
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

A dialogue is essentially a multi-turn interaction among interlocutors. Effective evaluation metrics should reflect the dynamics of such interaction. Existing automatic metrics are focused very much on the turn-level quality, while ignoring such dynamics. To this end, we propose DynaEval, a unified automatic evaluation framework which is not only capable of performing turn-level evaluation, but also holistically considers the quality of the entire dialogue. In DynaEval, the graph convolutional network (GCN) is adopted to model a dialogue in totality, where the graph nodes denote each individual utterance and the edges represent the dependency between pairs of utterances. A contrastive loss is then applied to distinguish well-formed dialogues from carefully constructed negative samples. Experiments show that DynaEval significantly outperforms the state-of-the-art dialogue coherence model, and correlates strongly with human judgements across multiple dialogue evaluation aspects at both turn and dialogue level.

pdf bib
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue
Haizhou Li | Gina-Anne Levow | Zhou Yu | Chitralekha Gupta | Berrak Sisman | Siqi Cai | David Vandyke | Nina Dethlefs | Yan Wu | Junyi Jessy Li
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

2020

pdf
Modeling Code-Switch Languages Using Bilingual Parallel Corpus
Grandee Lee | Haizhou Li
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Language modeling is the technique to estimate the probability of a sequence of words. A bilingual language model is expected to model the sequential dependency for words across languages, which is difficult due to the inherent lack of suitable training data as well as diverse syntactic structure across languages. We propose a bilingual attention language model (BALM) that simultaneously performs language modeling objective with a quasi-translation objective to model both the monolingual as well as the cross-lingual sequential dependency. The attention mechanism learns the bilingual context from a parallel corpus. BALM achieves state-of-the-art performance on the SEAME code-switch database by reducing the perplexity of 20.5% over the best-reported result. We also apply BALM in bilingual lexicon induction, and language normalization tasks to validate the idea.

2018

pdf bib
Proceedings of the Seventh Named Entities Workshop
Nancy Chen | Rafael E. Banchs | Xiangyu Duan | Min Zhang | Haizhou Li
Proceedings of the Seventh Named Entities Workshop

pdf
Named-Entity Tagging and Domain adaptation for Better Customized Translation
Zhongwei Li | Xuancong Wang | Ai Ti Aw | Eng Siong Chng | Haizhou Li
Proceedings of the Seventh Named Entities Workshop

Customized translation need pay spe-cial attention to the target domain ter-minology especially the named-entities for the domain. Adding linguistic features to neural machine translation (NMT) has been shown to benefit translation in many studies. In this paper, we further demonstrate that adding named-entity (NE) feature with named-entity recognition (NER) into the source language produces better translation with NMT. Our experiments show that by just including the different NE classes and boundary tags, we can increase the BLEU score by around 1 to 2 points using the standard test sets from WMT2017. We also show that adding NE tags using NER and applying in-domain adaptation can be combined to further improve customized machine translation.

pdf
NEWS 2018 Whitepaper
Nancy Chen | Xiangyu Duan | Min Zhang | Rafael E. Banchs | Haizhou Li
Proceedings of the Seventh Named Entities Workshop

Transliteration is defined as phonetic translation of names across languages. Transliteration of Named Entities (NEs) is necessary in many applications, such as machine translation, corpus alignment, cross-language IR, information extraction and automatic lexicon acquisition. All such systems call for high-performance transliteration, which is the focus of shared task in the NEWS 2018 workshop. The objective of the shared task is to promote machine transliteration research by providing a common benchmarking platform for the community to evaluate the state-of-the-art technologies.

pdf
Report of NEWS 2018 Named Entity Transliteration Shared Task
Nancy Chen | Rafael E. Banchs | Min Zhang | Xiangyu Duan | Haizhou Li
Proceedings of the Seventh Named Entities Workshop

This report presents the results from the Named Entity Transliteration Shared Task conducted as part of The Seventh Named Entities Workshop (NEWS 2018) held at ACL 2018 in Melbourne, Australia. Similar to previous editions of NEWS, the Shared Task featured 19 tasks on proper name transliteration, including 13 different languages and two different Japanese scripts. A total of 6 teams from 8 different institutions participated in the evaluation, submitting 424 runs, involving different transliteration methodologies. Four performance metrics were used to report the evaluation results. The NEWS shared task on machine transliteration has successfully achieved its objectives by providing a common ground for the research community to conduct comparative evaluations of state-of-the-art technologies that will benefit the future research and development in this area.

2016

pdf
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling for Dialogue Topic Tracking
Seokhwan Kim | Rafael Banchs | Haizhou Li
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Proceedings of the Sixth Named Entity Workshop
Xiangyu Duan | Rafael E. Banchs | Min Zhang | Haizhou Li | A Kumaran
Proceedings of the Sixth Named Entity Workshop

pdf
Evaluating and Combining Name Entity Recognition Systems
Ridong Jiang | Rafael E. Banchs | Haizhou Li
Proceedings of the Sixth Named Entity Workshop

pdf
Whitepaper of NEWS 2016 Shared Task on Machine Transliteration
Xiangyu Duan | Min Zhang | Haizhou Li | Rafael Banchs | A Kumaran
Proceedings of the Sixth Named Entity Workshop

pdf
Report of NEWS 2016 Machine Transliteration Shared Task
Xiangyu Duan | Rafael Banchs | Min Zhang | Haizhou Li | A. Kumaran
Proceedings of the Sixth Named Entity Workshop

2015

pdf
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constraints from Wikipedia
Seokhwan Kim | Rafael E. Banchs | Haizhou Li
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

pdf bib
Proceedings of the Fifth Named Entity Workshop
Xiangyu Duan | Rafael E. Banchs | Min Zhang | Haizhou Li | A Kumaran
Proceedings of the Fifth Named Entity Workshop

pdf bib
Whitepaper of NEWS 2015 Shared Task on Machine Transliteration
Min Zhang | Haizhou Li | Rafael E. Banchs | A Kumaran
Proceedings of the Fifth Named Entity Workshop

pdf bib
Report of NEWS 2015 Machine Transliteration Shared Task
Rafael E. Banchs | Min Zhang | Xiangyu Duan | Haizhou Li | A. Kumaran
Proceedings of the Fifth Named Entity Workshop

pdf
Towards Improving Dialogue Topic Tracking Performances with Wikification of Concept Mentions
Seokhwan Kim | Rafael E. Banchs | Haizhou Li
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

2014

pdf
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain Knowledge from Wikipedia
Seokhwan Kim | Rafael E. Banchs | Haizhou Li
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2013

pdf bib
Meaning Unit Segmentation in English and Chinese: a New Approach to Discourse Phenomena
Jennifer Williams | Rafael Banchs | Haizhou Li
Proceedings of the Workshop on Discourse in Machine Translation

pdf
Broadcast News Story Segmentation Using Manifold Learning on Latent Topic Distributions
Xiaoming Lu | Lei Xie | Cheung-Chi Leung | Bin Ma | Haizhou Li
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf
Modeling of term-distance and term-occurrence information for improving n-gram language model performance
Tze Yuang Chong | Rafael E. Banchs | Eng Siong Chng | Haizhou Li
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf bib
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Haizhou Li | Chin-Yew Lin | Miles Osborne | Gary Geunbae Lee | Jong C. Park
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
Utilizing Dependency Language Models for Graph-based Dependency Parsing Models
Wenliang Chen | Min Zhang | Haizhou Li
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf
Modeling the Translation of Predicate-Argument Structure for SMT
Deyi Xiong | Min Zhang | Haizhou Li
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Haizhou Li | Chin-Yew Lin | Miles Osborne | Gary Geunbae Lee | Jong C. Park
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf
IRIS: a Chat-oriented Dialogue System based on the Vector Space Model
Rafael E. Banchs | Haizhou Li
Proceedings of the ACL 2012 System Demonstrations

pdf bib
Proceedings of the 4th Named Entity Workshop (NEWS) 2012
Min Zhang | Haizhou Li | A Kumaran
Proceedings of the 4th Named Entity Workshop (NEWS) 2012

pdf bib
Whitepaper of NEWS 2012 Shared Task on Machine Transliteration
Min Zhang | Haizhou Li | A Kumaran | Ming Liu
Proceedings of the 4th Named Entity Workshop (NEWS) 2012

pdf bib
Report of NEWS 2012 Machine Transliteration Shared Task
Min Zhang | Haizhou Li | A Kumaran | Ming Liu
Proceedings of the 4th Named Entity Workshop (NEWS) 2012

2011

pdf
CLGVSM: Adapting Generalized Vector Space Model to Cross-lingual Document Clustering
Guoyu Tang | Yunqing Xia | Min Zhang | Haizhou Li | Fang Zheng
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf
Joint Alignment and Artificial Data Generation: An Empirical Study of Pivot-based Machine Transliteration
Min Zhang | Xiangyu Duan | Ming Liu | Yunqing Xia | Haizhou Li
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf
SMT Helps Bitext Dependency Parsing
Wenliang Chen | Jun’ichi Kazama | Min Zhang | Yoshimasa Tsuruoka | Yujie Zhang | Yiou Wang | Kentaro Torisawa | Haizhou Li
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf
Joint Models for Chinese POS Tagging and Dependency Parsing
Zhenghua Li | Min Zhang | Wanxiang Che | Ting Liu | Wenliang Chen | Haizhou Li
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Proceedings of the 3rd Named Entities Workshop (NEWS 2011)
Min Zhang | Haizhou Li | A Kumaran
Proceedings of the 3rd Named Entities Workshop (NEWS 2011)

pdf bib
Report of NEWS 2011 Machine Transliteration Shared Task
Min Zhang | Haizhou Li | A Kumaran | Ming Liu
Proceedings of the 3rd Named Entities Workshop (NEWS 2011)

pdf bib
Whitepaper of NEWS 2011 Shared Task on Machine Transliteration
Min Zhang | A Kumaran | Haizhou Li
Proceedings of the 3rd Named Entities Workshop (NEWS 2011)

pdf
Enhancing Language Models in Statistical Machine Translation with Backward N-grams and Mutual Information Triggers
Deyi Xiong | Min Zhang | Haizhou Li
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf
AM-FM: A Semantic Framework for Translation Quality Assessment
Rafael E. Banchs | Haizhou Li
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

2010

pdf
Pseudo-Word for Phrase-Based Machine Translation
Xiangyu Duan | Min Zhang | Haizhou Li
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf
Error Detection for Statistical Machine Translation Using Linguistic Features
Deyi Xiong | Min Zhang | Haizhou Li
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf
Convolution Kernel over Packed Parse Forest
Min Zhang | Hui Zhang | Haizhou Li
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf
Linguistically Annotated Reordering: Evaluation and Analysis
Deyi Xiong | Min Zhang | Aiti Aw | Haizhou Li
Computational Linguistics, Volume 36, Issue 3 - September 2010

pdf bib
Proceedings of the 2010 Named Entities Workshop
A Kumaran | Haizhou Li
Proceedings of the 2010 Named Entities Workshop

pdf bib
Report of NEWS 2010 Transliteration Generation Shared Task
Haizhou Li | A Kumaran | Min Zhang | Vladimir Pervouchine
Proceedings of the 2010 Named Entities Workshop

pdf bib
Whitepaper of NEWS 2010 Shared Task on Transliteration Generation
Haizhou Li | A Kumaran | Min Zhang | Vladimir Pervouchine
Proceedings of the 2010 Named Entities Workshop

pdf
Report of NEWS 2010 Transliteration Mining Shared Task
A Kumaran | Mitesh M. Khapra | Haizhou Li
Proceedings of the 2010 Named Entities Workshop

pdf
Whitepaper of NEWS 2010 Shared Task on Transliteration Mining
A Kumaran | Mitesh M. Khapra | Haizhou Li
Proceedings of the 2010 Named Entities Workshop

pdf
Non-Isomorphic Forest Pair Translation
Hui Zhang | Min Zhang | Haizhou Li | Eng Siong Chng
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf
EM-based Hybrid Model for Bilingual Terminology Extraction from Comparable Corpora
Lianhau Lee | Aiti Aw | Min Zhang | Haizhou Li
Coling 2010: Posters

pdf
Improving Name Origin Recognition with Context Features and Unlabelled Data
Vladimir Pervouchine | Min Zhang | Ming Liu | Haizhou Li
Coling 2010: Posters

pdf
Machine Transliteration: Leveraging on Third Languages
Min Zhang | Xiangyu Duan | Vladimir Pervouchine | Haizhou Li
Coling 2010: Posters

pdf
I2R’s machine translation system for IWSLT 2010
Xiangyu Duan | Rafael Banchs | Jun Lang | Deyi Xiong | Aiti Aw | Min Zhang | Haizhou Li
Proceedings of the 7th International Workshop on Spoken Language Translation: Evaluation Campaign

pdf
Learning Translation Boundaries for Phrase-Based Decoding
Deyi Xiong | Min Zhang | Haizhou Li
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

2009

pdf
Tree Kernel-based SVM with Structured Syntactic Knowledge for BTG-based Phrase Reordering
Min Zhang | Haizhou Li
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf
Fast Translation Rule Matching for Syntax-based Statistical Machine Translation
Hui Zhang | Min Zhang | Haizhou Li | Chew Lim Tan
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf
K-Best Combination of Syntactic Parsers
Hui Zhang | Min Zhang | Chew Lim Tan | Haizhou Li
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf
I2R’s machine translation system for IWSLT 2009
Xiangyu Duan | Deyi Xiong | Hui Zhang | Min Zhang | Haizhou Li
Proceedings of the 6th International Workshop on Spoken Language Translation: Evaluation Campaign

In this paper, we describe the system and approach used by the Institute for Infocomm Research (I2R) for the IWSLT 2009 spoken language translation evaluation campaign. Two kinds of machine translation systems are applied, namely, phrase-based machine translation system and syntax-based machine translation system. To test syntax-based machine translation system on spoken language translation, variational systems are explored. On top of both phrase-based and syntax-based single systems, we further use rescoring method to improve the individual system performance and use system combination method to combine the strengths of the different individual systems. Rescoring is applied on each single system output, and system combination is applied on all rescoring outputs. Finally, our system combination framework shows better performance in Chinese-English BTEC task.

pdf
Efficient Beam Thresholding for Statistical Machine Translation
Deyi Xiong | Min Zhang | Aiti Aw | Haizhou Li
Proceedings of Machine Translation Summit XII: Posters

pdf
A Source Dependency Model for Statistical Machine Translation
Deyi Xiong | Min Zhang | Aiti Aw | Haizhou Li
Proceedings of Machine Translation Summit XII: Posters

pdf bib
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)
Haizhou Li | A Kumaran
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)

pdf bib
Report of NEWS 2009 Machine Transliteration Shared Task
Haizhou Li | A Kumaran | Vladimir Pervouchine | Min Zhang
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)

pdf bib
Whitepaper of NEWS 2009 Machine Transliteration Shared Task
Haizhou Li | A Kumaran | Min Zhang | Vladimir Pervouchine
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)

pdf bib
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP
Keh-Yih Su | Jian Su | Janyce Wiebe | Haizhou Li
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf
Transliteration Alignment
Vladimir Pervouchine | Haizhou Li | Bo Lin
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf
Forest-based Tree Sequence to String Translation Model
Hui Zhang | Min Zhang | Haizhou Li | Aiti Aw | Chew Lim Tan
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf
A Syntax-Driven Bracketing Model for Phrase-Based Translation
Deyi Xiong | Min Zhang | Aiti Aw | Haizhou Li
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf
Topological Ordering of Function Words in Hierarchical Phrase-based Translation
Hendra Setiawan | Min-Yen Kan | Haizhou Li | Philip Resnik
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf
A Comparative Study of Hypothesis Alignment and its Improvement for Machine Translation System Combination
Boxing Chen | Min Zhang | Haizhou Li | Aiti Aw
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Keh-Yih Su | Jian Su | Janyce Wiebe | Haizhou Li
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

pdf
MARS: Multilingual Access and Retrieval System with Enhanced Query Translation and Document Retrieval
Lianhau Lee | Aiti Aw | Thuy Vu | Sharifah Aljunied Mahani | Min Zhang | Haizhou Li
Proceedings of the ACL-IJCNLP 2009 Software Demonstrations

2008

pdf
A Tree Sequence Alignment-based Tree-to-Tree Translation Model
Min Zhang | Hongfei Jiang | Aiti Aw | Haizhou Li | Chew Lim Tan | Sheng Li
Proceedings of ACL-08: HLT

pdf
A Linguistically Annotated Reordering Model for BTG-based Statistical Machine Translation
Deyi Xiong | Min Zhang | Aiti Aw | Haizhou Li
Proceedings of ACL-08: HLT, Short Papers

pdf
Exploiting N-best Hypotheses for SMT Self-Enhancement
Boxing Chen | Min Zhang | Aiti Aw | Haizhou Li
Proceedings of ACL-08: HLT, Short Papers

pdf
Regenerating Hypotheses for Statistical Machine Translation
Boxing Chen | Min Zhang | Aiti Aw | Haizhou Li
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf
Linguistically Annotated BTG for Statistical Machine Translation
Deyi Xiong | Min Zhang | Aiti Aw | Haizhou Li
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf
Grammar Comparison Study for Translational Equivalence Modeling and Statistical Machine Translation
Min Zhang | Hongfei Jiang | Haizhou Li | Aiti Aw | Sheng Li
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf
Name Origin Recognition Using Maximum Entropy Model and Diverse Features
Min Zhang | Chengjie Sun | Haizhou Li | AiTi Aw | Chew Lim Tan | Xiaolong Wang
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

pdf
Multi-View Co-Training of Transliteration Model
Jin-Shea Kuo | Haizhou Li
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

pdf
Mining Transliterations from Web Query Results: An Incremental Approach
Jin-Shea Kuo | Haizhou Li | Chih-Lung Lin
Proceedings of the Sixth SIGHAN Workshop on Chinese Language Processing

pdf
I2R multi-pass machine translation system for IWSLT 2008.
Boxing Chen | Deyi Xiong | Min Zhang | Aiti Aw | Haizhou Li
Proceedings of the 5th International Workshop on Spoken Language Translation: Evaluation Campaign

In this paper, we describe the system and approach used by the Institute for Infocomm Research (I2R) for the IWSLT 2008 spoken language translation evaluation campaign. In the system, we integrate various decoding algorithms into a multi-pass translation framework. The multi-pass approach enables us to utilize various decoding algorithm and to explore much more hypotheses. This paper reports our design philosophy, overall architecture, each individual system and various system combination methods that we have explored. The performance on development and test sets are reported in detail in the paper. The system has shown competitive performance with respect to the BLEU and METEOR measures in Chinese-English Challenge and BTEC tasks.

pdf
The TALP&I2R SMT systems for IWSLT 2008.
Maxim Khalilov | Maria R. Costa-jussà | Carlos A. Henríquez Q. | José A. R. Fonollosa | Adolfo Hernández H. | José B. Mariño | Rafael E. Banchs | Chen Boxing | Min Zhang | Aiti Aw | Haizhou Li
Proceedings of the 5th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper gives a description of the statistical machine translation (SMT) systems developed at the TALP Research Center of the UPC (Universitat Polite`cnica de Catalunya) for our participation in the IWSLT’08 evaluation campaign. We present Ngram-based (TALPtuples) and phrase-based (TALPphrases) SMT systems. The paper explains the 2008 systems’ architecture and outlines translation schemes we have used, mainly focusing on the new techniques that are challenged to improve speech-to-speech translation quality. The novelties we have introduced are: improved reordering method, linear combination of translation and reordering models and new technique dealing with punctuation marks insertion for a phrase-based SMT system. This year we focus on the Arabic-English, Chinese-Spanish and pivot Chinese-(English)-Spanish translation tasks.

pdf
NIST 2007 Language Recognition Evaluation: From the Perspective of IIR
Haizhou Li | Bin Ma | Kong-Aik Lee | Khe-Chai Sim | Hanwu Sun | Rong Tong | Donglai Zhu | Changhuai You
Proceedings of the 22nd Pacific Asia Conference on Language, Information and Computation

2007

pdf
Semantic Transliteration of Personal Names
Haizhou Li | Khe Chai Sim | Jin-Shea Kuo | Minghui Dong
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

pdf
Ordering Phrases with Function Words
Hendra Setiawan | Min-Yen Kan | Haizhou Li
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

pdf
A Statistical Language Modeling Approach to Lattice-Based Spoken Document Retrieval
Tee Kiah Chia | Haizhou Li | Hwee Tou Ng
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2006

pdf
Learning Transliteration Lexicons from the Web
Jin-Shea Kuo | Haizhou Li | Ying-Kuei Yang
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

pdf
A Comparative Study of Four Language Identification Systems
Bin Ma | Haizhou Li
International Journal of Computational Linguistics & Chinese Language Processing, Volume 11, Number 2, June 2006

2005

pdf
A Phonotactic Language Model for Spoken Language Identification
Haizhou Li | Bin Ma
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

pdf
Phrase-Based Statistical Machine Translation: A Level of Detail Approach
Hendra Setiawan | Haizhou Li | Min Zhang | Beng Chin Ooi
Second International Joint Conference on Natural Language Processing: Full Papers

pdf
A Phrase-Based Context-Dependent Joint Probability Model for Named Entity Translation
Min Zhang | Haizhou Li | Jian Su | Hendra Setiawan
Second International Joint Conference on Natural Language Processing: Full Papers

pdf
Learning Phrase Translation using Level of Detail Approach
Hendra Setiawan | Haizhou Li | Min Zhang
Proceedings of Machine Translation Summit X: Papers

We propose a simplified Level Of Detail (LOD) algorithm to learn phrase translation for statistical machine translation. In particular, LOD learns unknown phrase translations from parallel texts without linguistic knowledge. LOD uses an agglomerative method to attack the combinatorial explosion that results when generating candidate phrase translations. Although LOD was previously proposed by (Setiawan et al., 2005), we improve the original algorithm in two ways: simplifying the algorithm and using a simpler translation model. Experimental results show that our algorithm provides comparable performance while demonstrating a significant reduction in computation time.

2004

pdf
Direct Orthographical Mapping for Machine Transliteration
Min Zhang | Haizhou Li | Jian Su
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf
A Joint Source-Channel Model for Machine Transliteration
Haizhou Li | Min Zhang | Jian Su
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

1998

pdf
Chinese Word Segmentation
Haizhou Li | Baosheng Yuan
Proceedings of the 12th Pacific Asia Conference on Language, Information and Computation

Search
Co-authors