2021
pdf
Applying Masked Language Models to Search for Suitable Verbs Used in Academic Writing
Chooi Ling Goh
Proceedings of the 35th Pacific Asia Conference on Language, Information and Computation
2011
pdf
Rule-based Reordering Constraints for Phrase-based SMT
Chooi-Ling Goh
|
Takashi Onishi
|
Eiichiro Sumita
Proceedings of the 15th Annual Conference of the European Association for Machine Translation
pdf
abs
The NICT translation system for IWSLT 2011
Andrew Finch
|
Chooi-Ling Goh
|
Graham Neubig
|
Eiichiro Sumita
Proceedings of the 8th International Workshop on Spoken Language Translation: Evaluation Campaign
This paper describes NICT’s participation in the IWSLT 2011 evaluation campaign for the TED speech translation ChineseEnglish shared-task. Our approach was based on a phrasebased statistical machine translation system that was augmented in two ways. Firstly we introduced rule-based re-ordering constraints on the decoding. This consisted of a set of rules that were used to segment the input utterances into segments that could be decoded almost independently. This idea here being that constraining the decoding process in this manner would greatly reduce the search space of the decoder, and cut out many possibilities for error while at the same time allowing for a correct output to be generated. The rules we used exploit punctuation and spacing in the input utterances, and we use these positions to delimit our segments. Not all punctuation/spacing positions were used as segment boundaries, and the set of used positions were determined by a set of linguistically-based heuristics. Secondly we used two heterogeneous methods to build the translation model, and lexical reordering model for our systems. The first method employed the popular method of using GIZA++ for alignment in combination with phraseextraction heuristics. The second method used a recentlydeveloped Bayesian alignment technique that is able to perform both phrase-to-phrase alignment and phrase pair extraction within a single unsupervised process. The models produced by this type of alignment technique are typically very compact whilst at the same time maintaining a high level of translation quality. We evaluated both of these methods of translation model construction in isolation, and our results show their performance is comparable. We also integrated both models by linear interpolation to obtain a model that outperforms either component. Finally, we added an indicator feature into the log-linear model to indicate those phrases that were in the intersection of the two translation models. The addition of this feature was also able to provide a small improvement in performance.
2010
pdf
abs
The NICT translation system for IWSLT 2010
Chooi-Ling Goh
|
Taro Watanabe
|
Michael Paul
|
Andrew Finch
|
Eiichiro Sumita
Proceedings of the 7th International Workshop on Spoken Language Translation: Evaluation Campaign
This paper describes NICT’s participation in the IWSLT 2010 evaluation campaign for the DIALOG translation (Chinese-English) and the BTEC (French-English) translation shared-tasks. For the DIALOG translation, the main challenge to this task is applying context information during translation. Context information can be used to decide on word choice and also to replace missing information during translation. We applied discriminative reranking using contextual information as additional features. In order to provide more choices for re-ranking, we generated n-best lists from multiple phrase-based statistical machine translation systems that varied in the type of Chinese word segmentation schemes used. We also built a model that merged the phrase tables generated by the different segmentation schemes. Furthermore, we used a lattice-based system combination model to combine the output from different systems. A combination of all of these systems was used to produce the n-best lists for re-ranking. For the BTEC task, a general approach that used latticebased system combination of two systems, a standard phrasebased system and a hierarchical phrase-based system, was taken. We also tried to process some unknown words by replacing them with the same words but different inflections that are known to the system.
2009
pdf
Towards automatic acquisition of linguistic features
Yves Lepage
|
Chooi Ling Goh
Proceedings of the 17th Nordic Conference of Computational Linguistics (NODALIDA 2009)
2006
pdf
The Construction of a Dictionary for a Two-layer Chinese Morphological Analyzer
Chooi-Ling Goh
|
Jia Lü
|
Yuchang Cheng
|
Masayuki Asahara
|
Yuji Matsumoto
Proceedings of the 20th Pacific Asia Conference on Language, Information and Computation
2005
pdf
Building a Japanese-Chinese Dictionary Using Kanji/Hanzi Conversion
Chooi-Ling Goh
|
Masayuki Asahara
|
Yuji Matsumoto
Second International Joint Conference on Natural Language Processing: Full Papers
pdf
Combination of Machine Learning Methods for Optimum Chinese Word Segmentation
Masayuki Asahara
|
Kenta Fukuoka
|
Ai Azuma
|
Chooi-Ling Goh
|
Yotaro Watanabe
|
Yuji Matsumoto
|
Takashi Tsuzuki
Proceedings of the Fourth SIGHAN Workshop on Chinese Language Processing
pdf
Chinese Word Segmentation by Classification of Characters
Chooi-Ling Goh
|
Masayuki Asahara
|
Yuji Matsumoto
International Journal of Computational Linguistics & Chinese Language Processing, Volume 10, Number 3, September 2005: Special Issue on Selected Papers from ROCLING XVI
2004
pdf
Chinese Word Segmentation by Classification of Characters
Chooi-Ling Goh
|
Masayuki Asahara
|
Yuji Matsumoto
Proceedings of the Third SIGHAN Workshop on Chinese Language Processing
pdf
Pruning False Unknown Words to Improve Chinese Word Segmentation
Chooi-Ling Goh
|
Masayuki Asahara
|
Yuji Matsumoto
Proceedings of the 18th Pacific Asia Conference on Language, Information and Computation
2003
pdf
Chinese Unknown Word Identification Using Character-based Tagging and Chunking
Chooi Ling Goh
|
Masayuki Asahara
|
Yuji Matsumoto
The Companion Volume to the Proceedings of 41st Annual Meeting of the Association for Computational Linguistics
pdf
Combining Segmenter and Chunker for Chinese Word Segmentation
Masayuki Asahara
|
Chooi Ling Goh
|
Xiaojie Wang
|
Yuji Matsumoto
Proceedings of the Second SIGHAN Workshop on Chinese Language Processing