Yohei Murakami

2022

pdf abs
A Neural Network Approach to Create Minangkabau-Indonesia Bilingual Dictionary
Kartika Resiandi | Yohei Murakami | Arbi Haza Nasution
Proceedings of the 1st Annual Meeting of the ELRA/ISCA Special Interest Group on Under-Resourced Languages

Indonesia has many varieties of ethnic languages, and most come from the same language family, namely Austronesian languages. Coming from that same language family, the words in Indonesian ethnic languages are very similar. However, there is research stating that Indonesian ethnic languages are endangered. Thus, to prevent that, we proposed to create a bilingual dictionary between ethnic languages using a neural network approach to extract transformation rules using character level embedding and the Bi-LSTM method in a sequence-to-sequence model. The model has an encoder and decoder. The encoder functions read the input sequence, character by character, generate context, then extract a summary of the input. The decoder will produce an output sequence where every character in each time-step and the next character that comes out are affected by the previous character. The current case for experiment translation focuses on Minangkabau and Indonesian languages with 13761-word pairs. For evaluating the model’s performance, 5-Fold Cross-Validation is used.

pdf abs
Quality Control for Crowdsourced Bilingual Dictionary in Low-Resource Languages
Hiroki Chida | Yohei Murakami | Mondheera Pituxcoosuvarn
Proceedings of the Thirteenth Language Resources and Evaluation Conference

In conventional bilingual dictionary creation by using crowdsourcing, the main method is to ask multiple workers to translate the same words or sentences and take a majority vote. However, when this method is applied to the creation of bilingual dictionaries for low-resource languages with few speakers, many low-quality workers are expected to participate in the majority voting, which makes it difficult to maintain the quality of the evaluation by the majority voting. Therefore, we apply an effective aggregation method using a hyper question, which is a set of single questions, for quality control. Furthermore, to select high-quality workers, we design a task-allocation method based on the reliability of workers which is evaluated by their work results.

2018

pdf
A Framework for Multi-Language Service Design with the Language Grid
Donghui Lin | Yohei Murakami | Toru Ishida
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf
Designing a Collaborative Process to Create Bilingual Dictionaries of Indonesian Ethnic Languages
Arbi Haza Nasution | Yohei Murakami | Toru Ishida
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

2016

pdf bib
Proceedings of the Third International Workshop on Worldwide Language Service Infrastructure and Second Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies (WLSI/OIAF4HLT2016)
Yohei Murakami | Donghui Lin | Nancy Ide | James Pustejovsky
Proceedings of the Third International Workshop on Worldwide Language Service Infrastructure and Second Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies (WLSI/OIAF4HLT2016)

pdf abs
An Ontology for Language Service Composability
Yohei Murakami | Takao Nakaguchi | Donghui Lin | Toru Ishida
Proceedings of the Third International Workshop on Worldwide Language Service Infrastructure and Second Workshop on Open Infrastructures and Analysis Frameworks for Human Language Technologies (WLSI/OIAF4HLT2016)

Fragmentation and recombination is a key to create customized language environments for supporting various intercultural activities. Fragmentation provides various language resource components for the customized language environments and recombination builds each language environment according to user’s request by combining these components. To realize this fragmentation and recombination process, existing language resources (both data and programs) should be shared as language services and combined beyond mismatch of their service interfaces. To address this issue, standardization is inevitable: standardized interfaces are necessary for language services as well as data format required for language resources. Therefore, we have constructed a hierarchy of language services based on inheritance of service interfaces, which is called language service ontology. This ontology allows users to create a new customized language service that is compatible with existing ones. Moreover, we have developed a dynamic service binding technology that instantiates various executable customized services from an abstract workflow according to user’s request. By using the ontology and service binding together, users can bind the instantiated language service to another abstract workflow for a new customized one.

pdf abs
Constraint-Based Bilingual Lexicon Induction for Closely Related Languages
Arbi Haza Nasution | Yohei Murakami | Toru Ishida
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

The lack or absence of parallel and comparable corpora makes bilingual lexicon extraction becomes a difficult task for low-resource languages. Pivot language and cognate recognition approach have been proven useful to induce bilingual lexicons for such languages. We analyze the features of closely related languages and define a semantic constraint assumption. Based on the assumption, we propose a constraint-based bilingual lexicon induction for closely related languages by extending constraints and translation pair candidates from recent pivot language approach. We further define three constraint sets based on language characteristics. In this paper, two controlled experiments are conducted. The former involves four closely related language pairs with different language pair similarities, and the latter focuses on sense connectivity between non-pivot words and pivot words. We evaluate our result with F-measure. The result indicates that our method works better on voluminous input dictionaries and high similarity languages. Finally, we introduce a strategy to use proper constraint sets for different goals and language characteristics.

2014

pdf abs
Integration of Workflow and Pipeline for Language Service Composition
Trang Mai Xuan | Yohei Murakami | Donghui Lin | Toru Ishida
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Integrating language resources and language services is a critical part of building natural language processing applications. Service workflow and processing pipeline are two approaches for sharing and combining language resources. Workflow languages focus on expressive power of the languages to describe variety of workflow patterns to meet users’ needs. Users can combine those language services in service workflows to meet their requirements. The workflows can be accessible in distributed manner and can be invoked independently of the platforms. However, workflow languages lack of pipelined execution support to improve performance of workflows. Whereas, the processing pipeline provides a straightforward way to create a sequence of linguistic processing to analyze large amounts of text data. It focuses on using pipelined execution and parallel execution to improve throughput of pipelines. However, the resulting pipelines are standalone applications, i.e., software tools that are accessible only via local machine and that can only be run with the processing pipeline platforms. In this paper we propose an integration framework of the two approaches so that each offests the disadvantages of the other. We then present a case study wherein two representative frameworks, the Language Grid and UIMA, are integrated.

With the development of the Internet environments, more and more language services become accessible for common people. However, the gap between human translators and machine translators remains huge especially for the domain of localization processes that requires high translation quality. Although efforts of combining human and machine translators for supporting multilingual communication have been reported in previous research, how to apply such approaches for improving localization processes are rarely discussed. In this paper, we aim at improving localization processes by composing human and machine translation services based on the Language Grid, which is a language service platform that we have developed. Further, we conduct experiments to compare the translation quality and translation cost using several translation processes, including absolute machine translation processes, absolute human translation processes and translation processes by human and machine translation services. The experiment results show that composing monolingual roles and dictionary services improves the translation quality of machine translators, and that collaboration of human and machine translators is possible to reduce the cost comparing with the absolute bilingual human translation. We also discuss the generality of the experimental results and further challenging issues of the proposed localization processes.

pdf abs
Language Service Management with the Language Grid
Yohei Murakami | Donghui Lin | Masahiro Tanaka | Takao Nakaguchi | Toru Ishida
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

As the number of language resources accessible on the Internet increases, many efforts have been made for combining language resources and language processing tools to create new services. However, existing language resource coordination frameworks cannot manage issues of intellectual property associated with language resources, which make it difficult for most end-users to get supports for their intercultural collaborations because they always have to deal with the issues by themselves. In this paper, we aim at constructing a new language service management architecture on the Language Grid, which enables language resource providers to control access to their resources in accordance with their own policies. Furthermore, we apply the proposed architecture to the operating Language Grid in order to validate the effectiveness of the architecture. As a result, several service management models utilizing the monitoring and access constraints are occurring to satisfy various requirements from language resource providers. These models can handle paid-for language resources as well as free language resources. Finally, we discuss further challenging issues of combining language resources under each different policies.