Kyo Kageura

2020

pdf bib
Proceedings of the 6th International Workshop on Computational Terminology
Béatrice Daille | Kyo Kageura | Ayla Rigouts Terryn
Proceedings of the 6th International Workshop on Computational Terminology

pdf bib abs
Multilingualization of Medical Terminology: Semantic and Structural Embedding Approaches
Long-Huei Chen | Kyo Kageura
Proceedings of the 12th Language Resources and Evaluation Conference

The multilingualization of terminology is an essential step in the translation pipeline, to ensure the correct transfer of domain-specific concepts. Many institutions and language service providers construct and maintain multilingual terminologies, which constitute important assets. However, the curation of such multilingual resources requires significant human effort; though automatic multilingual term extraction methods have been proposed so far, they are of limited success as term translation cannot be satisfied by simply conveying meaning, but requires the terminologists and domain experts’ knowledge to fit the term within the existing terminology. Here we propose a method to encode the structural property of a term by aligning their embeddings using graph convolutional networks trained from separate languages. We observe that the structural information can augment the semantic methods also explored in this work, and recognize the unique nature of terminologies allows our method to fully take advantage and produce superior results.

2019

pdf bib
Translating Terminologies: A Comparative Examination of NMT and PBSMT Systems
Long-Huei Chen | Kyo Kageura
Proceedings of Machine Translation Summit XVII: Translator, Project and User Tracks

pdf bib abs
Entropic characterisation of termino-conceptual structure : A preliminary study
Kyo Kageura | Long-Huei Chen
Actes de la Conférence sur le Traitement Automatique des Langues Naturelles (TALN) PFIA 2019. Terminologie et Intelligence Artificielle (atelier TALN-RECITAL \& IC)

Terms represent concepts, which consist of conceptual characteristics. In actual concept-term formation, which is done by researchers, the process is in reverse: conceptual elements/characteristics are consolidated to form concepts, which are represented by terms. As concepts do not exist on the fly, what we may call termino-conceptual system provides scaffolding in this process. Terminologists, both in practice and in research, do not only collect and list terms but also analyse, describe and define terms and systematise terminologies. To carry out these tasks, terminologists must refer to conceptual systems, to the extent that they contribute to systematising terminologies; terminologists thus also deal with the sphere of termino-conceptual system. In this paper, we consolidate the status of termino-conceptual sphere and propose a way to characterise the structure of termino-conceptual system by using entropy. The entropic characterisation of English terminologies of six domain, i.e. agriculture, botany, chemistry, computer science, physics and psychology are presented.

2017

pdf bib abs
Consistent Classification of Translation Revisions: A Case Study of English-Japanese Student Translations
Atsushi Fujita | Kikuko Tanabe | Chiho Toyoshima | Mayuka Yamamoto | Kyo Kageura | Anthony Hartley
Proceedings of the 11th Linguistic Annotation Workshop

Consistency is a crucial requirement in text annotation. It is especially important in educational applications, as lack of consistency directly affects learners’ motivation and learning performance. This paper presents a quality assessment scheme for English-to-Japanese translations produced by learner translators at university. We constructed a revision typology and a decision tree manually through an application of the OntoNotes method, i.e., an iteration of assessing learners’ translations and hypothesizing the conditions for consistent decision making, as well as re-organizing the typology. Intrinsic evaluation of the created scheme confirmed its potential contribution to the consistent classification of identified erroneous text spans, achieving visibly higher Cohen’s kappa values, up to 0.831, than previous work. This paper also describes an application of our scheme to an English-to-Japanese translation exercise course for undergraduate students at a university in Japan.

pdf bib abs
‘Fighting’ or ‘Conflict’? An Approach to Revealing Concepts of Terms in Political Discourse
Linyuan Tang | Kyo Kageura
Proceedings of the 2017 EMNLP Workshop: Natural Language Processing meets Journalism

Previous work on the epistemology of fact-checking indicated the dilemma between the needs of binary answers for the public and ambiguity of political discussion. Determining concepts represented by terms in political discourse can be considered as a Word-Sense Disambiguation (WSD) task. The analysis of political discourse, however, requires identifying precise concepts of terms from relatively small data. This work attempts to provide a basic framework for revealing concepts of terms in political discourse with explicit contextual information. The framework consists of three parts: 1) extracting important terms, 2) generating concordance for each term with stipulative definitions and explanations, and 3) agglomerating similar information of the term by hierarchical clustering. Utterances made by Prime Minister Abe Shinzo in the Diet of Japan are used to examine our framework. Importantly, we revealed the conceptual inconsistency of the term Sonritsu-kiki-jitai. The framework was proved to work, but only for a small number of terms due to lack of explicit contextual information.

2016

pdf bib
Proceedings of the 5th International Workshop on Computational Terminology (Computerm2016)
Patrick Drouin | Natalia Grabar | Thierry Hamon | Kyo Kageura | Koichi Takeuchi
Proceedings of the 5th International Workshop on Computational Terminology (Computerm2016)

pdf bib abs
A Method of Augmenting Bilingual Terminology by Taking Advantage of the Conceptual Systematicity of Terminologies
Miki Iwai | Koichi Takeuchi | Kyo Kageura | Kazuya Ishibashi
Proceedings of the 5th International Workshop on Computational Terminology (Computerm2016)

In this paper, we propose a method of augmenting existing bilingual terminologies. Our method belongs to a “generate and validate” framework rather than extraction from corpora. Although many studies have proposed methods to find term translations or to augment terminology within a “generate and validate” framework, few has taken full advantage of the systematic nature of terminologies. A terminology of a domain represents the conceptual system of the domain fairly systematically, and we contend that making use of the systematicity fully will greatly contribute to the effective augmentation of terminologies. This paper proposes and evaluates a novel method to generate bilingual term candidates by using existing terminologies and delving into their systematicity. Experiments have shown that our method can generate much better term candidate pairs than the existing method and give improved performance for terminology augmentation.

pdf bib abs
Constructing and Evaluating Controlled Bilingual Terminologies
Rei Miyata | Kyo Kageura
Proceedings of the 5th International Workshop on Computational Terminology (Computerm2016)

This paper presents the construction and evaluation of Japanese and English controlled bilingual terminologies that are particularly intended for controlled authoring and machine translation with special reference to the Japanese municipal domain. Our terminologies are constructed by extracting terms from municipal website texts, and the term variations are controlled by defining preferred and proscribed terms for both the source Japanese and the target English. To assess the coverage of the terms/concepts in the municipal domain and validate the quality of the control, we employ a quantitative extrapolation method that estimates the potential vocabulary size. Using Large-Number-of-Rare-Event (LNRE) modelling, we compare two parameters: (1) uncontrolled and controlled and (2) Japanese and English. The results show that our terminologies currently cover about 45–65% of the terms and 50–65% of the concepts in the municipal domain, and are well controlled. The detailed analysis of growth patterns of terminologies also provides insight into the extent to which we can enlarge the terminologies within the realistic range.

pdf bib abs
MuTUAL: A Controlled Authoring Support System Enabling Contextual Machine Translation
Rei Miyata | Anthony Hartley | Kyo Kageura | Cécile Paris | Masao Utiyama | Eiichiro Sumita
Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations

The paper introduces a web-based authoring support system, MuTUAL, which aims to help writers create multilingual texts. The highlighted feature of the system is that it enables machine translation (MT) to generate outputs appropriate to their functional context within the target document. Our system is operational online, implementing core mechanisms for document structuring and controlled writing. These include a topic template and a controlled language authoring assistant, linked to our statistical MT system.

In this paper we report a way of constructing a translation corpus that contains not only source and target texts, but draft and final versions of target texts, through the translation hosting site Minna no Hon'yaku (MNH). We made MNH publicly available on April 2009. Since then, more than 1,000 users have registered and over 3,500 documents have been translated, as of February 2010, from English to Japanese and from Japanese to English. MNH provides an integrated translation-aid environment, QRedit, which enables translators to look up high-quality dictionaries and Wikipedia as well as to search Google seamlessly. As MNH keeps translation logs, a corpus consisting of source texts, draft translations in several versions, and final translations is constructed naturally through MNH. As of 7 February, 764 documents with multiple translation versions are accumulated, of which 110 are edited by more than one translators. This corpus can be used for self-learning by inexperienced translators on MNH, and potentially for improving machine translation.

pdf bib
Multilingual Lexical Network from the Archives of the Digital Silk Road
Hans-Mohammad Daoud | Kyo Kageura | Christian Boitet | Asanobu Kitamoto | Mathieu Mangeot
Proceedings of the 6th Workshop on Ontologies and Lexical Resources

pdf bib
Helping Volunteer Translators, Fostering Language Resources
Masao Utiyama | Takeshi Abekawa | Eiichiro Sumita | Kyo Kageura
Proceedings of the 2nd Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources

pdf bib
Being Theoretical is Being Practical: Multiword Units and Terminological Structure Revitalised
Kyo Kageura
Proceedings of the 2010 Workshop on Multiword Expressions: from Theory to Applications

2009

pdf bib
Minna no Hon’yaku: a website for hosting, archiving, and promoting translations
Masao Utiyama | Takeshi Abekawa | Eiichiro Sumita | Kyo Kageura
Proceedings of Translating and the Computer 31

pdf bib
Anchor Points for Bilingual Lexicon Extraction from Small Comparable Corpora
Emmanuel Prochasson | Emmanuel Morin | Kyo Kageura
Proceedings of Machine Translation Summit XII: Posters

pdf bib
Hosting Volunteer Translators
Masao Utiyama | Takeshi Abekawa | Eiichiro Sumita | Kyo Kageura
Proceedings of Machine Translation Summit XII: Posters

2008

pdf bib abs
Constructing a Corpus that Indicates Patterns of Modification between Draft and Final Translations by Human Translators
Takeshi Abekawa | Kyo Kageura
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

In human translation, translators first make draft translations and then modify and edit them. In the case of experienced translators, this process involves the use of wide-ranging expert knowledge, which has mostly remained implicit so far. Describing the difference between draft and final translations, therefore, should contribute to making this knowledge explicit. If we could clarify the expert knowledge of translators, hopefully in a computationally tractable way, we would be able to contribute to the automatic notification of awkward translations to assist inexperienced translators, improving the quality of MT output, etc. Against this backdrop, we have started constructing a corpus that indicates patterns of modification between draft and final translations made by human translators. This paper reports on our progress to date.

pdf bib
What Prompts Translators to Modify Draft Translations? An Analysis of Basic Modification Patterns for Use in the Automatic Notification of Awkwardly Translated Text
Takeshi Abekawa | Kyo Kageura
Proceedings of the Third International Joint Conference on Natural Language Processing: Volume-I

2007

pdf bib
Bilingual Terminology Mining - Using Brain, not brawn comparable corpora
Emmanuel Morin | Béatrice Daille | Koichi Takeuchi | Kyo Kageura
Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics

pdf bib
A Translation Aid System with a Stratified Lookup Interface
Takeshi Abekawa | Kyo Kageura
Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions

pdf bib
BEYTrans: A Free Online Collaborative Wiki-Based CAT Environment Designed for Online Translation Communities
Youcef Bey | Kyo Kageura | Christian Boitet
Proceedings of the 21st Pacific Asia Conference on Language, Information and Computation

pdf bib
Exploring the Microscopic Textual Characteristics of Japanese Prime Ministers’ Diet Addresses by Measuring the Quantity and Diversity of Nouns
Takafumi Suzuki | Kyo Kageura
Proceedings of the 21st Pacific Asia Conference on Language, Information and Computation

pdf bib
Flexible automatic look-up of English idiom entries in dictionaries
Koichi Takeuchi | Takashi Kanehila | Kazuki Hilao | Takeshi Abekawa | Kyo Kageura
Proceedings of Machine Translation Summit XI: Papers

2006

pdf bib abs
A Self-Referring Quantitative Evaluation of the ATR Basic Travel Expression Corpus (BTEC)
Kyo Kageura | Genichiro Kikui
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)

In this paper we evaluate the Basic Travel Expression Corpus (BTEC), developed by ATR (Advanced Telecommunication Research Laboratory), Japan. BTEC was specifically developed as a wide-coverage, consistent corpus containing basic Japanese travel expressions with English counterparts, for the purpose of providing basic data for the development of high quality speech translation systems. To evaluate the corpus, we introduce a quantitative method for evaluating the sufficiency of qualitatively well-defined corpora, on the basis of LNRE methods that can estimate the potential growth patterns of various sparse data by fitting various skewed distributions such as the Zipfian group of distributions, lognormal distribution, and inverse Gauss-Poisson distribution to them. The analyses show the coverage of lexical items of BTEC vis-a-vis the possible targets implicitly defined by the corpus itself, and thus provides basic insights into strategies for enhancing BTEC in future.

pdf bib
Data Management in QRLex, an Online Aid System for Volunteer Translators’
Youcef Bey | Kyo Kageura | Christian Boitet
International Journal of Computational Linguistics & Chinese Language Processing, Volume 11, Number 4, December 2006