Haizhou Li


2021

pdf bib
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue
Haizhou Li | Gina-Anne Levow | Zhou Yu | Chitralekha Gupta | Berrak Sisman | Siqi Cai | David Vandyke | Nina Dethlefs | Yan Wu | Junyi Jessy Li
Proceedings of the 22nd Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf bib
Bootstrapped Unsupervised Sentence Representation Learning
Yan Zhang | Ruidan He | Zuozhu Liu | Lidong Bing | Haizhou Li
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

As high-quality labeled data is scarce, unsupervised sentence representation learning has attracted much attention. In this paper, we propose a new framework with a two-branch Siamese Network which maximizes the similarity between two augmented views of each sentence. Specifically, given one augmented view of the input sentence, the online network branch is trained by predicting the representation yielded by the target network of the same sentence under another augmented view. Meanwhile, the target network branch is bootstrapped with a moving average of the online network. The proposed method significantly outperforms other state-of-the-art unsupervised methods on semantic textual similarity (STS) and classification tasks. It can be adopted as a post-training procedure to boost the performance of the supervised methods. We further extend our method for learning multilingual sentence representations and demonstrate its effectiveness on cross-lingual STS tasks. Our code is available at https://github.com/yanzhangnlp/BSL.

pdf bib
DynaEval: Unifying Turn and Dialogue Level Evaluation
Chen Zhang | Yiming Chen | Luis Fernando D’Haro | Yan Zhang | Thomas Friedrichs | Grandee Lee | Haizhou Li
Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

A dialogue is essentially a multi-turn interaction among interlocutors. Effective evaluation metrics should reflect the dynamics of such interaction. Existing automatic metrics are focused very much on the turn-level quality, while ignoring such dynamics. To this end, we propose DynaEval, a unified automatic evaluation framework which is not only capable of performing turn-level evaluation, but also holistically considers the quality of the entire dialogue. In DynaEval, the graph convolutional network (GCN) is adopted to model a dialogue in totality, where the graph nodes denote each individual utterance and the edges represent the dependency between pairs of utterances. A contrastive loss is then applied to distinguish well-formed dialogues from carefully constructed negative samples. Experiments show that DynaEval significantly outperforms the state-of-the-art dialogue coherence model, and correlates strongly with human judgements across multiple dialogue evaluation aspects at both turn and dialogue level.

pdf bib
Revisiting Self-training for Few-shot Learning of Language Model
Yiming Chen | Yan Zhang | Chen Zhang | Grandee Lee | Ran Cheng | Haizhou Li
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing

As unlabeled data carry rich task-relevant information, they are proven useful for few-shot learning of language model. The question is how to effectively make use of such data. In this work, we revisit the self-training technique for language model fine-tuning and present a state-of-the-art prompt-based few-shot learner, SFLM. Given two views of a text sample via weak and strong augmentation techniques, SFLM generates a pseudo label on the weakly augmented version. Then, the model predicts the same pseudo label when fine-tuned with the strongly augmented version. This simple approach is shown to outperform other state-of-the-art supervised and semi-supervised counterparts on six sentence classification and six sentence-pair classification benchmarking tasks. In addition, SFLM only relies on a few in-domain unlabeled data. We conduct a comprehensive analysis to demonstrate the robustness of our proposed approach under various settings, including augmentation techniques, model scale, and few-shot knowledge transfer across tasks.

2020

pdf bib
Modeling Code-Switch Languages Using Bilingual Parallel Corpus
Grandee Lee | Haizhou Li
Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics

Language modeling is the technique to estimate the probability of a sequence of words. A bilingual language model is expected to model the sequential dependency for words across languages, which is difficult due to the inherent lack of suitable training data as well as diverse syntactic structure across languages. We propose a bilingual attention language model (BALM) that simultaneously performs language modeling objective with a quasi-translation objective to model both the monolingual as well as the cross-lingual sequential dependency. The attention mechanism learns the bilingual context from a parallel corpus. BALM achieves state-of-the-art performance on the SEAME code-switch database by reducing the perplexity of 20.5% over the best-reported result. We also apply BALM in bilingual lexicon induction, and language normalization tasks to validate the idea.

2018

pdf bib
Proceedings of the Seventh Named Entities Workshop
Nancy Chen | Rafael E. Banchs | Xiangyu Duan | Min Zhang | Haizhou Li
Proceedings of the Seventh Named Entities Workshop

pdf bib
Named-Entity Tagging and Domain adaptation for Better Customized Translation
Zhongwei Li | Xuancong Wang | Ai Ti Aw | Eng Siong Chng | Haizhou Li
Proceedings of the Seventh Named Entities Workshop

Customized translation need pay spe-cial attention to the target domain ter-minology especially the named-entities for the domain. Adding linguistic features to neural machine translation (NMT) has been shown to benefit translation in many studies. In this paper, we further demonstrate that adding named-entity (NE) feature with named-entity recognition (NER) into the source language produces better translation with NMT. Our experiments show that by just including the different NE classes and boundary tags, we can increase the BLEU score by around 1 to 2 points using the standard test sets from WMT2017. We also show that adding NE tags using NER and applying in-domain adaptation can be combined to further improve customized machine translation.

pdf bib
NEWS 2018 Whitepaper
Nancy Chen | Xiangyu Duan | Min Zhang | Rafael E. Banchs | Haizhou Li
Proceedings of the Seventh Named Entities Workshop

Transliteration is defined as phonetic translation of names across languages. Transliteration of Named Entities (NEs) is necessary in many applications, such as machine translation, corpus alignment, cross-language IR, information extraction and automatic lexicon acquisition. All such systems call for high-performance transliteration, which is the focus of shared task in the NEWS 2018 workshop. The objective of the shared task is to promote machine transliteration research by providing a common benchmarking platform for the community to evaluate the state-of-the-art technologies.

pdf bib
Report of NEWS 2018 Named Entity Transliteration Shared Task
Nancy Chen | Rafael E. Banchs | Min Zhang | Xiangyu Duan | Haizhou Li
Proceedings of the Seventh Named Entities Workshop

This report presents the results from the Named Entity Transliteration Shared Task conducted as part of The Seventh Named Entities Workshop (NEWS 2018) held at ACL 2018 in Melbourne, Australia. Similar to previous editions of NEWS, the Shared Task featured 19 tasks on proper name transliteration, including 13 different languages and two different Japanese scripts. A total of 6 teams from 8 different institutions participated in the evaluation, submitting 424 runs, involving different transliteration methodologies. Four performance metrics were used to report the evaluation results. The NEWS shared task on machine transliteration has successfully achieved its objectives by providing a common ground for the research community to conduct comparative evaluations of state-of-the-art technologies that will benefit the future research and development in this area.

2016

pdf bib
Proceedings of the Sixth Named Entity Workshop
Xiangyu Duan | Rafael E. Banchs | Min Zhang | Haizhou Li | A Kumaran
Proceedings of the Sixth Named Entity Workshop

pdf bib
Evaluating and Combining Name Entity Recognition Systems
Ridong Jiang | Rafael E. Banchs | Haizhou Li
Proceedings of the Sixth Named Entity Workshop

pdf bib
Whitepaper of NEWS 2016 Shared Task on Machine Transliteration
Xiangyu Duan | Min Zhang | Haizhou Li | Rafael Banchs | A Kumaran
Proceedings of the Sixth Named Entity Workshop

pdf bib
Report of NEWS 2016 Machine Transliteration Shared Task
Xiangyu Duan | Rafael Banchs | Min Zhang | Haizhou Li | A. Kumaran
Proceedings of the Sixth Named Entity Workshop

pdf bib
Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling for Dialogue Topic Tracking
Seokhwan Kim | Rafael Banchs | Haizhou Li
Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2015

pdf bib
Proceedings of the Fifth Named Entity Workshop
Xiangyu Duan | Rafael E. Banchs | Min Zhang | Haizhou Li | A Kumaran
Proceedings of the Fifth Named Entity Workshop

pdf bib
Whitepaper of NEWS 2015 Shared Task on Machine Transliteration
Min Zhang | Haizhou Li | Rafael E. Banchs | A Kumaran
Proceedings of the Fifth Named Entity Workshop

pdf bib
Report of NEWS 2015 Machine Transliteration Shared Task
Rafael E. Banchs | Min Zhang | Xiangyu Duan | Haizhou Li | A. Kumaran
Proceedings of the Fifth Named Entity Workshop

pdf bib
Towards Improving Dialogue Topic Tracking Performances with Wikification of Concept Mentions
Seokhwan Kim | Rafael E. Banchs | Haizhou Li
Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue

pdf bib
Wikification of Concept Mentions within Spoken Dialogues Using Domain Constraints from Wikipedia
Seokhwan Kim | Rafael E. Banchs | Haizhou Li
Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing

2014

pdf bib
A Composite Kernel Approach for Dialog Topic Tracking with Structured Domain Knowledge from Wikipedia
Seokhwan Kim | Rafael E. Banchs | Haizhou Li
Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2013

pdf bib
Meaning Unit Segmentation in English and Chinese: a New Approach to Discourse Phenomena
Jennifer Williams | Rafael Banchs | Haizhou Li
Proceedings of the Workshop on Discourse in Machine Translation

pdf bib
Broadcast News Story Segmentation Using Manifold Learning on Latent Topic Distributions
Xiaoming Lu | Lei Xie | Cheung-Chi Leung | Bin Ma | Haizhou Li
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
Modeling of term-distance and term-occurrence information for improving n-gram language model performance
Tze Yuang Chong | Rafael E. Banchs | Eng Siong Chng | Haizhou Li
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf bib
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Haizhou Li | Chin-Yew Lin | Miles Osborne | Gary Geunbae Lee | Jong C. Park
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Utilizing Dependency Language Models for Graph-based Dependency Parsing Models
Wenliang Chen | Min Zhang | Haizhou Li
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Modeling the Translation of Predicate-Argument Structure for SMT
Deyi Xiong | Min Zhang | Haizhou Li
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

pdf bib
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
Haizhou Li | Chin-Yew Lin | Miles Osborne | Gary Geunbae Lee | Jong C. Park
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

pdf bib
IRIS: a Chat-oriented Dialogue System based on the Vector Space Model
Rafael E. Banchs | Haizhou Li
Proceedings of the ACL 2012 System Demonstrations

pdf bib
Proceedings of the 4th Named Entity Workshop (NEWS) 2012
Min Zhang | Haizhou Li | A Kumaran
Proceedings of the 4th Named Entity Workshop (NEWS) 2012

pdf bib
Whitepaper of NEWS 2012 Shared Task on Machine Transliteration
Min Zhang | Haizhou Li | A Kumaran | Ming Liu
Proceedings of the 4th Named Entity Workshop (NEWS) 2012

pdf bib
Report of NEWS 2012 Machine Transliteration Shared Task
Min Zhang | Haizhou Li | A Kumaran | Ming Liu
Proceedings of the 4th Named Entity Workshop (NEWS) 2012

2011

pdf bib
Enhancing Language Models in Statistical Machine Translation with Backward N-grams and Mutual Information Triggers
Deyi Xiong | Min Zhang | Haizhou Li
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
AM-FM: A Semantic Framework for Translation Quality Assessment
Rafael E. Banchs | Haizhou Li
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies

pdf bib
CLGVSM: Adapting Generalized Vector Space Model to Cross-lingual Document Clustering
Guoyu Tang | Yunqing Xia | Min Zhang | Haizhou Li | Fang Zheng
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Joint Alignment and Artificial Data Generation: An Empirical Study of Pivot-based Machine Transliteration
Min Zhang | Xiangyu Duan | Ming Liu | Yunqing Xia | Haizhou Li
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
SMT Helps Bitext Dependency Parsing
Wenliang Chen | Jun’ichi Kazama | Min Zhang | Yoshimasa Tsuruoka | Yujie Zhang | Yiou Wang | Kentaro Torisawa | Haizhou Li
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Joint Models for Chinese POS Tagging and Dependency Parsing
Zhenghua Li | Min Zhang | Wanxiang Che | Ting Liu | Wenliang Chen | Haizhou Li
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

pdf bib
Proceedings of the 3rd Named Entities Workshop (NEWS 2011)
Min Zhang | Haizhou Li | A Kumaran
Proceedings of the 3rd Named Entities Workshop (NEWS 2011)

pdf bib
Report of NEWS 2011 Machine Transliteration Shared Task
Min Zhang | Haizhou Li | A Kumaran | Ming Liu
Proceedings of the 3rd Named Entities Workshop (NEWS 2011)

pdf bib
Whitepaper of NEWS 2011 Shared Task on Machine Transliteration
Min Zhang | A Kumaran | Haizhou Li
Proceedings of the 3rd Named Entities Workshop (NEWS 2011)

2010

pdf bib
I2R’s machine translation system for IWSLT 2010
Xiangyu Duan | Rafael Banchs | Jun Lang | Deyi Xiong | Aiti Aw | Min Zhang | Haizhou Li
Proceedings of the 7th International Workshop on Spoken Language Translation: Evaluation Campaign

pdf bib
Non-Isomorphic Forest Pair Translation
Hui Zhang | Min Zhang | Haizhou Li | Eng Siong Chng
Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing

pdf bib
EM-based Hybrid Model for Bilingual Terminology Extraction from Comparable Corpora
Lianhau Lee | Aiti Aw | Min Zhang | Haizhou Li
Coling 2010: Posters

pdf bib
Improving Name Origin Recognition with Context Features and Unlabelled Data
Vladimir Pervouchine | Min Zhang | Ming Liu | Haizhou Li
Coling 2010: Posters

pdf bib
Machine Transliteration: Leveraging on Third Languages
Min Zhang | Xiangyu Duan | Vladimir Pervouchine | Haizhou Li
Coling 2010: Posters

pdf bib
Linguistically Annotated Reordering: Evaluation and Analysis
Deyi Xiong | Min Zhang | Aiti Aw | Haizhou Li
Computational Linguistics, Volume 36, Issue 3 - September 2010

pdf bib
Learning Translation Boundaries for Phrase-Based Decoding
Deyi Xiong | Min Zhang | Haizhou Li
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Pseudo-Word for Phrase-Based Machine Translation
Xiangyu Duan | Min Zhang | Haizhou Li
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf bib
Error Detection for Statistical Machine Translation Using Linguistic Features
Deyi Xiong | Min Zhang | Haizhou Li
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf bib
Convolution Kernel over Packed Parse Forest
Min Zhang | Hui Zhang | Haizhou Li
Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics

pdf bib
Proceedings of the 2010 Named Entities Workshop
A Kumaran | Haizhou Li
Proceedings of the 2010 Named Entities Workshop

pdf bib
Report of NEWS 2010 Transliteration Generation Shared Task
Haizhou Li | A Kumaran | Min Zhang | Vladimir Pervouchine
Proceedings of the 2010 Named Entities Workshop

pdf bib
Whitepaper of NEWS 2010 Shared Task on Transliteration Generation
Haizhou Li | A Kumaran | Min Zhang | Vladimir Pervouchine
Proceedings of the 2010 Named Entities Workshop

pdf bib
Report of NEWS 2010 Transliteration Mining Shared Task
A Kumaran | Mitesh M. Khapra | Haizhou Li
Proceedings of the 2010 Named Entities Workshop

pdf bib
Whitepaper of NEWS 2010 Shared Task on Transliteration Mining
A Kumaran | Mitesh M. Khapra | Haizhou Li
Proceedings of the 2010 Named Entities Workshop

2009

pdf bib
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP
Keh-Yih Su | Jian Su | Janyce Wiebe | Haizhou Li
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
Transliteration Alignment
Vladimir Pervouchine | Haizhou Li | Bo Lin
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
Forest-based Tree Sequence to String Translation Model
Hui Zhang | Min Zhang | Haizhou Li | Aiti Aw | Chew Lim Tan
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
A Syntax-Driven Bracketing Model for Phrase-Based Translation
Deyi Xiong | Min Zhang | Aiti Aw | Haizhou Li
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
Topological Ordering of Function Words in Hierarchical Phrase-based Translation
Hendra Setiawan | Min-Yen Kan | Haizhou Li | Philip Resnik
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
A Comparative Study of Hypothesis Alignment and its Improvement for Machine Translation System Combination
Boxing Chen | Min Zhang | Haizhou Li | Aiti Aw
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP

pdf bib
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers
Keh-Yih Su | Jian Su | Janyce Wiebe | Haizhou Li
Proceedings of the ACL-IJCNLP 2009 Conference Short Papers

pdf bib
MARS: Multilingual Access and Retrieval System with Enhanced Query Translation and Document Retrieval
Lianhau Lee | Aiti Aw | Thuy Vu | Sharifah Aljunied Mahani | Min Zhang | Haizhou Li
Proceedings of the ACL-IJCNLP 2009 Software Demonstrations

pdf bib
Tree Kernel-based SVM with Structured Syntactic Knowledge for BTG-based Phrase Reordering
Min Zhang | Haizhou Li
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Fast Translation Rule Matching for Syntax-based Statistical Machine Translation
Hui Zhang | Min Zhang | Haizhou Li | Chew Lim Tan
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
K-Best Combination of Syntactic Parsers
Hui Zhang | Min Zhang | Chew Lim Tan | Haizhou Li
Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing

pdf bib
Efficient Beam Thresholding for Statistical Machine Translation
Deyi Xiong | Min Zhang | Aiti Aw | Haizhou Li
Proceedings of Machine Translation Summit XII: Posters

pdf bib
A Source Dependency Model for Statistical Machine Translation
Deyi Xiong | Min Zhang | Aiti Aw | Haizhou Li
Proceedings of Machine Translation Summit XII: Posters

pdf bib
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)
Haizhou Li | A Kumaran
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)

pdf bib
Report of NEWS 2009 Machine Transliteration Shared Task
Haizhou Li | A Kumaran | Vladimir Pervouchine | Min Zhang
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)

pdf bib
Whitepaper of NEWS 2009 Machine Transliteration Shared Task
Haizhou Li | A Kumaran | Min Zhang | Vladimir Pervouchine
Proceedings of the 2009 Named Entities Workshop: Shared Task on Transliteration (NEWS 2009)

pdf bib
I2R’s machine translation system for IWSLT 2009
Xiangyu Duan | Deyi Xiong | Hui Zhang | Min Zhang | Haizhou Li
Proceedings of the 6th International Workshop on Spoken Language Translation: Evaluation Campaign

In this paper, we describe the system and approach used by the Institute for Infocomm Research (I2R) for the IWSLT 2009 spoken language translation evaluation campaign. Two kinds of machine translation systems are applied, namely, phrase-based machine translation system and syntax-based machine translation system. To test syntax-based machine translation system on spoken language translation, variational systems are explored. On top of both phrase-based and syntax-based single systems, we further use rescoring method to improve the individual system performance and use system combination method to combine the strengths of the different individual systems. Rescoring is applied on each single system output, and system combination is applied on all rescoring outputs. Finally, our system combination framework shows better performance in Chinese-English BTEC task.

2008

pdf bib
Regenerating Hypotheses for Statistical Machine Translation
Boxing Chen | Min Zhang | Aiti Aw | Haizhou Li
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf bib
Linguistically Annotated BTG for Statistical Machine Translation
Deyi Xiong | Min Zhang | Aiti Aw | Haizhou Li
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf bib
Grammar Comparison Study for Translational Equivalence Modeling and Statistical Machine Translation
Min Zhang | Hongfei Jiang | Haizhou Li | Aiti Aw | Sheng Li
Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008)

pdf bib
I2R multi-pass machine translation system for IWSLT 2008.
Boxing Chen | Deyi Xiong | Min Zhang | Aiti Aw | Haizhou Li
Proceedings of the 5th International Workshop on Spoken Language Translation: Evaluation Campaign

In this paper, we describe the system and approach used by the Institute for Infocomm Research (I2R) for the IWSLT 2008 spoken language translation evaluation campaign. In the system, we integrate various decoding algorithms into a multi-pass translation framework. The multi-pass approach enables us to utilize various decoding algorithm and to explore much more hypotheses. This paper reports our design philosophy, overall architecture, each individual system and various system combination methods that we have explored. The performance on development and test sets are reported in detail in the paper. The system has shown competitive performance with respect to the BLEU and METEOR measures in Chinese-English Challenge and BTEC tasks.

pdf bib
The TALP&I2R SMT systems for IWSLT 2008.
Maxim Khalilov | Maria R. Costa-jussà | Carlos A. Henríquez Q. | José A. R. Fonollosa | Adolfo Hernández H. | José B. Mariño | Rafael E. Banchs | Chen Boxing | Min Zhang | Aiti Aw | Haizhou Li
Proceedings of the 5th International Workshop on Spoken Language Translation: Evaluation Campaign

This paper gives a description of the statistical machine translation (SMT) systems developed at the TALP Research Center of the UPC (Universitat Polite`cnica de Catalunya) for our participation in the IWSLT’08 evaluation campaign. We present Ngram-based (TALPtuples) and phrase-based (TALPphrases) SMT systems. The paper explains the 2008 systems’ architecture and outlines translation schemes we have used, mainly focusing on the new techniques that are challenged to improve speech-to-speech translation quality. The novelties we have introduced are: improved reordering method, linear combination of translation and reordering models and new technique dealing with punctuation marks insertion for a phrase-based SMT system. This year we focus on the Arabic-English, Chinese-Spanish and pivot Chinese-(English)-Spanish translation tasks.

pdf bib
NIST 2007 Language Recognition Evaluation: From the Perspective of IIR
Haizhou Li | Bin Ma | Kong-Aik Lee | Khe-Chai Sim | Hanwu Sun | Rong Tong | Donglai Zhu | Changhuai You
Proceedings of the 22nd Pacific Asia Conference on Language, Information and Computation

pdf bib
A Tree Sequence Alignment-based Tree-to-Tree Translation Model
Min Zhang | Hongfei Jiang | Aiti Aw | Haizhou Li | Chew Lim Tan | Sheng Li
Proceedings of ACL-08: HLT

pdf bib
A Linguistically Annotated Reordering Model for BTG-based Statistical Machine Translation
Deyi Xiong | Min Zhang | Aiti Aw | Haizhou Li
Proceedings of ACL-08: HLT, Short Papers

pdf bib
Exploiting N-best Hypotheses for SMT Self-Enhancement
Boxing Chen | Min Zhang | Aiti Aw | Haizhou Li
Proceedings of ACL-08: HLT, Short Papers

pdf bib
Name Origin Recognition Using Maximum Entropy Model and Diverse Features
Min Zhang | Chengjie Sun | Haizhou Li | AiTi Aw | Chew Lim Tan | Xiaolong Wang
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

pdf bib
Multi-View Co-Training of Transliteration Model
Jin-Shea Kuo | Haizhou Li
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

pdf bib
Mining Transliterations from Web Query Results: An Incremental Approach
Jin-Shea Kuo | Haizhou Li | Chih-Lung Lin
Proceedings of the Sixth SIGHAN Workshop on Chinese Language Processing

2007

pdf bib
Semantic Transliteration of Personal Names
Haizhou Li | Khe Chai Sim | Jin-Shea Kuo | Minghui Dong
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

pdf bib
Ordering Phrases with Function Words
Hendra Setiawan | Min-Yen Kan | Haizhou Li
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

pdf bib
A Statistical Language Modeling Approach to Lattice-Based Spoken Document Retrieval
Tee Kiah Chia | Haizhou Li | Hwee Tou Ng
Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL)

2006

pdf bib
A Comparative Study of Four Language Identification Systems
Bin Ma | Haizhou Li
International Journal of Computational Linguistics & Chinese Language Processing, Volume 11, Number 2, June 2006

pdf bib
Learning Transliteration Lexicons from the Web
Jin-Shea Kuo | Haizhou Li | Ying-Kuei Yang
Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics

2005

pdf bib
A Phonotactic Language Model for Spoken Language Identification
Haizhou Li | Bin Ma
Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05)

pdf bib
Learning Phrase Translation using Level of Detail Approach
Hendra Setiawan | Haizhou Li | Min Zhang
Proceedings of Machine Translation Summit X: Papers

We propose a simplified Level Of Detail (LOD) algorithm to learn phrase translation for statistical machine translation. In particular, LOD learns unknown phrase translations from parallel texts without linguistic knowledge. LOD uses an agglomerative method to attack the combinatorial explosion that results when generating candidate phrase translations. Although LOD was previously proposed by (Setiawan et al., 2005), we improve the original algorithm in two ways: simplifying the algorithm and using a simpler translation model. Experimental results show that our algorithm provides comparable performance while demonstrating a significant reduction in computation time.

pdf bib
Phrase-Based Statistical Machine Translation: A Level of Detail Approach
Hendra Setiawan | Haizhou Li | Min Zhang | Beng Chin Ooi
Second International Joint Conference on Natural Language Processing: Full Papers

pdf bib
A Phrase-Based Context-Dependent Joint Probability Model for Named Entity Translation
Min Zhang | Haizhou Li | Jian Su | Hendra Setiawan
Second International Joint Conference on Natural Language Processing: Full Papers

2004

pdf bib
Direct Orthographical Mapping for Machine Transliteration
Min Zhang | Haizhou Li | Jian Su
COLING 2004: Proceedings of the 20th International Conference on Computational Linguistics

pdf bib
A Joint Source-Channel Model for Machine Transliteration
Haizhou Li | Min Zhang | Jian Su
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

1998

pdf bib
Chinese Word Segmentation
Haizhou Li | Baosheng Yuan
Proceedings of the 12th Pacific Asia Conference on Language, Information and Computation