Ikki Ohmukai


2024

pdf
The Metronome Approach to Sanskrit Meter: Analysis for the Rigveda
Yuzuki Tsukagoshi | Ikki Ohmukai
Proceedings of the 1st Workshop on Machine Learning for Ancient Languages (ML4AL 2024)

This study analyzes the verses of the Rigveda, the oldest Sanskrit text, from a metrical perspective. Based on metrical structures, the verses are represented by four elements: light syllables, heavy syllables, word boundaries, and line boundaries. As a result, it became evident that among verses traditionally categorized under the same metrical name, there are those forming distinct clusters. Furthermore, the study reveals commonalities in metrical structures, such as similar metrical patterns grouping together despite differences in the number of lines. Going forward, it is anticipated that this methodology will enable comparisons across multiple languages within the Indo-European language family.

2022

pdf
A Japanese Masked Language Model for Academic Domain
Hiroki Yamauchi | Tomoyuki Kajiwara | Marie Katsurai | Ikki Ohmukai | Takashi Ninomiya
Proceedings of the Third Workshop on Scholarly Document Processing

We release a pretrained Japanese masked language model for an academic domain. Pretrained masked language models have recently improved the performance of various natural language processing applications. In domains such as medical and academic, which include a lot of technical terms, domain-specific pretraining is effective. While domain-specific masked language models for medical and SNS domains are widely used in Japanese, along with domain-independent ones, pretrained models specific to the academic domain are not publicly available. In this study, we pretrained a RoBERTa-based Japanese masked language model on paper abstracts from the academic database CiNii Articles. Experimental results on Japanese text classification in the academic domain revealed the effectiveness of the proposed model over existing pretrained models.