Takako Aikawa

2012

pdf abs
How Good Is Crowd Post-Editing? Its Potential and Limitations
Midori Tatsumi | Takako Aikawa | Kentaro Yamamoto | Hitoshi Isahara
Workshop on Post-Editing Technology and Practice

This paper is a partial report of a research effort on evaluating the effect of crowd-sourced post-editing. We first discuss the emerging trend of crowd-sourced post-editing of machine translation output, along with its benefits and drawbacks. Second, we describe the pilot study we have conducted on a platform that facilitates crowd-sourced post-editing. Finally, we provide our plans for further studies to have more insight on how effective crowd-sourced post-editing is.

A statistical machine translation (SMT) system requires homogeneous training data in order to get domain-sensitive (or context-sensitive) terminology translations. If the data contains various domains, it is difficult for an SMT to learn context-sensitive terminology mappings probabilistically. Yet, terminology translation accuracy is an important issue for MT users. This paper explores an approach to tackle this terminology translation problem for an SMT. We propose a way to identify terminology translations from MT output and automatically swap them with user-defined translations. Our approach is simple and can be applied to any type of MT system. We call our prototype Term Swapper. Term Swapper allows MT users to draw on their own dictionaries without affecting any parts of the MT output except for the terminology translation(s) in question. Using an SMT developed at Microsoft Research, called MSR-MT (Quirk et al., (2005); Menezes & Quirk (2005)), we conducted initial experiments to investigate the coverage rate of Term Swapper and its impact on the overall quality of MT output. The results from our experiments show high coverage and positive impact on the overall MT quality.

2007

pdf bib
Impact of controlled language on translation quality and post-editing in a statistical machine translation environment
Takako Aikawa | Lee Schwartz | Ronit King | Mo Corston-Oliver | Carmen Lozano
Proceedings of Machine Translation Summit XI: Papers

pdf
Automatic validation of terminology translation consistenscy with statistical method
Masaki Itagaki | Takako Aikawa | Xiaodong He
Proceedings of Machine Translation Summit XI: Papers

2006

pdf abs
Detecting Inter-domain Semantic Shift using Syntactic Similarity
Masaki Itagaki | Anthony Aue | Takako Aikawa
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This poster is a preliminary report of our experiments for detecting semantically shifted terms between different domains for the purposes of new concept extraction. A given term in one domain may represent a different concept in another domain. In our approach, we quantify the degree of similarity of words between different domains by measuring the degree of overlap in their domain-specific semantic spaces. The domain-specific semantic spaces are defined by extracting families of syntactically similar words, i.e. words that occur in the same syntactic context. Our method does not rely on any external resources other than a syntactic parser. Yet it has the potential to extract semantically shifted terms between two different domains automatically while paying close attention to contextual information. The organization of the poster is as follows: Section 1 provides our motivation. Section 2 provides an overview of our NLP technology and explains how we extract syntactically similar words. Section 3 describes the design of our experiments and our method. Section 4 provides our observations and preliminary results. Section 5 presents some work to be done in the future and concluding remarks.

pdf abs
Predicting MT Quality as a Function of the Source Language
David M. Rojas | Takako Aikawa
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

This paper describes one phase of a large-scale machine translation (MT) quality assurance project. We explore a novel approach to discriminating MT-unsuitable source sentences by predicting the expected quality of the output. The resources required include a set of source/MT sentence pairs, human judgments on the output, a source parser, and an MT system. We extract a number of syntactic, semantic, and lexical features from the source sentences only and train a classifier that we call the Syntactic, Semantic, and Lexical Model (SSLM) (cf. Gamon et al., 2005; Liu & Gildea, 2005; Rajman & Hartley, 2001). Despite the simplicity of the approach, SSLM scores correlate with human judgments and can help determine whether sentences are suitable or unsuitable for translation by our MT system. SSLM also provides information about which source features impact MT quality, connecting this work with the field of controlled language (CL) (cf. Reuther, 2003; Nyberg & Mitamura, 1996). With a focus on the input side of MT, SSLM differs greatly from evaluation approaches such as BLEU (Papineni et al., 2002), NIST (Doddington, 2002) and METEOR (Banerjee & Lavie, 2005) in that these other systems compare MT output with reference sentences for evaluation and do not provide feedback regarding potentially problematic source material. Our method bridges the research areas of CL and MT evaluation by addressing the importance of providing MT-suitable English input to enhance output quality.

2004

pdf
Multilingual Corpus-based Approach to the Resolution of English –ing
Lee Schwartz | Takako Aikawa
Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’04)

2003

pdf abs
Disambiguation of English PP attachment using multilingual aligned data
Lee Schwartz | Takako Aikawa | Chris Quirk
Proceedings of Machine Translation Summit IX: Papers

Prepositional phrase attachment (PP attachment) is a major source of ambiguity in English. It poses a substantial challenge to Machine Translation (MT) between English and languages that are not characterized by PP attachment ambiguity. In this paper we present an unsupervised, bilingual, corpus-based approach to the resolution of English PP attachment ambiguity. As data we use aligned linguistic representations of the English and Japanese sentences from a large parallel corpus of technical texts. The premise of our approach is that with large aligned, parsed, bilingual (or multilingual) corpora, languages can learn non-trivial linguistic information from one another with high accuracy. We contend that our approach can be extended to linguistic phenomena other than PP attachment.

2002

pdf
Combining Machine Learning and Rule-based Approaches in Spanish and Japanese Sentence Realization
Maite Melero | Takako Aikawa | Lee Schwartz
Proceedings of the International Natural Language Generation Conference

2001

pdf bib abs
Generation for multilingual MT
Takako Aikawa | Maite Melero | Lee Schwartz | Andi Wu
Proceedings of Machine Translation Summit VIII

This paper presents an overview of the broad-coverage, application-independent natural language generation component of the NLP system being developed at Microsoft Research. It demonstrates how this component functions within a multilingual Machine Translation system (MSR-MT), using the languages that we are currently working on (English, Spanish, Japanese, and Chinese). Section 1 provides a system description of MSR-MT. Section 2 focuses on the generation component and its set of core rules. Section 3 describes an additional layer of generation rules with examples that address issues specific to MT. Section 4 presents evaluation results in the context of MSR-MT. Section 5 addresses generation issues outside of MT.

pdf
Multilingual Sentence Generation
Takako Aikawa | Maite Melero | Lee Schwartz | Andi Wu
Proceedings of the ACL 2001 Eighth European Workshop on Natural Language Generation (EWNLG)