Yoshiki Mikami


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2012

pdf bib
Stemming Tigrinya Words for Information Retrieval
Omer Osman | Yoshiki Mikami
Proceedings of COLING 2012: Demonstration Papers

2008

pdf bib
A Rule-based Syllable Segmentation of Myanmar Text
Zin Maung Maung | Yoshiki Mikami
Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages

pdf bib
Technical Terminology in Asian Languages: Different Approaches to Adopting Engineering Terms
Makiko Matsuda | Tomoe Takahashi | Hiroki Goto | Yoshikazu Hayase | Robin Lee Nagano | Yoshiki Mikami
Proceedings of the 6th Workshop on Asian Language Resources

pdf bib
The Link Structure of Language Communities and its Implication for Language-specific Crawling
Rizza Caminero | Yoshiki Mikami
Proceedings of the 6th Workshop on Asian Language Resources

2005

pdf bib
Language and Encoding Scheme Identification of Extremely Large Sets of Multilingual Text
Pavol Zavarsky | Yoshiki Mikami | Shota Wada
Proceedings of Machine Translation Summit X: Posters

In the paper we present an outline of our approach to identify languages and encoding schemes in extremely large sets of multi-lingual documents. The large sets we are analyzing in our Language Observatory project [1] are formed by dozens of millions of text documents. In the paper we present an approach which allows us to analyze about 250 documents every second (about 20 million documents/day) on a single Linux machine. Using a multithread processing on a cluster of Linux servers we are able to analyze easily more than 100 million documents/day.