Yuji Yamamoto
2026
Timesteps of Mamba Align with Human Reading Times
Yuji Yamamoto | Shinnosuke Isono | Yoshinobu Kawahara | Sho Yokoi
Findings of the Association for Computational Linguistics: ACL 2026
Yuji Yamamoto | Shinnosuke Isono | Yoshinobu Kawahara | Sho Yokoi
Findings of the Association for Computational Linguistics: ACL 2026
This study demonstrates an alignment of per-word processing time in a popular state-space language model Mamba and human readers. In Mamba, the recurrent state transition at each layer conceptually takes some duration of time, the discretization timestep 𝛥t, determined dynamically in response to the input. Using a naturalistic reading dataset, we show that the per-word timestep from Mamba is a powerful predictor of human reading times, comparable to strong baselines such as word frequency and GPT-2 surprisal and significant even when they are controlled for. We further suggest, through formal analysis of Mamba’s architecture and internal dynamics, that Mamba can serve as a new, valuable lens to look at human real-time language processing with ever-updated memory, because it allows us to look at how each module (layer) weighs short- and long-term information retention, and how noise may interact with dynamic, continuous memory representation. Code is available via an (anonymized) link.
2023
Absolute Position Embedding Learns Sinusoid-like Waves for Attention Based on Relative Position
Yuji Yamamoto | Takuya Matsuzaki
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Yuji Yamamoto | Takuya Matsuzaki
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing
Attention weight is a clue to interpret how a Transformer-based model makes an inference. In some attention heads, the attention focuses on the neighbors of each token. This allows the output vector of each token to depend on the surrounding tokens and contributes to make the inference context-dependent. We analyze the mechanism behind the concentration of attention on nearby tokens. We show that the phenomenon emerges as follows: (1) learned position embedding has sinusoid-like components, (2) such components are transmitted to the query and the key in the self-attention, (3) the attention head shifts the phases of the sinusoid-like components so that the attention concentrates on nearby tokens at specific relative positions. In other words, a certain type of Transformer-based model acquires the sinusoidal positional encoding to some extent on its own through Masked Language Modeling.
2017
Rule-based MT and UTX Glossary Management – Honda’s Case Dealing with Thousands of Technical Terms
Saemi Hirayama | Yuji Yamamoto
Proceedings of Machine Translation Summit XVI: Commercial MT Users and Translators Track
Saemi Hirayama | Yuji Yamamoto
Proceedings of Machine Translation Summit XVI: Commercial MT Users and Translators Track
2011
UTX 1.11, a Simple and Open User Dictionary/Terminology Standard, and its Effectiveness with Multiple MT Systems
Seiji Okura | Yuji Yamamoto | Hajime Ito | Michael Kato | Miwako Shimazu
Proceedings of Machine Translation Summit XIII: Papers
Seiji Okura | Yuji Yamamoto | Hajime Ito | Michael Kato | Miwako Shimazu
Proceedings of Machine Translation Summit XIII: Papers
2008
Sharing User Dictionaries Across Multiple Systems with UTX-S
Francis Bond | Seiji Okura | Yuji Yamamoto | Toshiki Murata | Kiyotaka Uchimoto | Michael Kato | Miwako Shimazu | Tsugiyoshi Suzuki
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Government and Commercial Uses of MT
Francis Bond | Seiji Okura | Yuji Yamamoto | Toshiki Murata | Kiyotaka Uchimoto | Michael Kato | Miwako Shimazu | Tsugiyoshi Suzuki
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Government and Commercial Uses of MT
Careful tuning of user-created dictionaries is indispensable when using a machine translation system for computer aided translation. However, there is no widely used standard for user dictionaries in the Japanese/English machine translation market. To address this issue, AAMT (the Asia-Pacific Association for Machine Translation) has established a specification of sharable dictionaries (UTX-S: Universal Terminology eXchange -- Simple), which can be used across different machine translation systems, thus increasing the interoperability of language resources. UTX-S is simpler than existing specifications such as UPF and OLIF. It was explicitly designed to make it easy to (a) add new user dictionaries and (b) share existing user dictionaries. This facilitates rapid user dictionary production and avoids vendor tie in. In this study we describe the UTX-Simple (UTX-S) format, and show that it can be converted to the user dictionary formats for five commercial English-Japanese MT systems. We then present a case study where we (a) convert an on-line glossary to UTX-S, and (b) produce user dictionaries for five different systems, and then exchange them. The results show that the simplified format of UTX-S can be used to rapidly build dictionaries. Further, we confirm that customized user dictionaries are effective across systems, although with a slight loss in quality: on average, user dictionaries improved the translations for 44.8% of translations with the systems they were built for and 37.3% of translations for different systems. In ongoing work, AAMT is using UTX-S as the format in building up a user community for producing, sharing, and accumulating user dictionaries in a sustainable way.