Yoshiki Mikami - ACL Anthology

This is an internal, incomplete preview of a proposed change to the ACL Anthology. For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes. Do not treat this content as an official publication.

Yoshiki Mikami

2012

pdf bib
Stemming Tigrinya Words for Information Retrieval
Omer Osman | Yoshiki Mikami
Proceedings of COLING 2012: Demonstration Papers

2008

pdf bib
A Rule-based Syllable Segmentation of Myanmar Text
Zin Maung Maung | Yoshiki Mikami
Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages

pdf bib
Technical Terminology in Asian Languages: Different Approaches to Adopting Engineering Terms
Makiko Matsuda | Tomoe Takahashi | Hiroki Goto | Yoshikazu Hayase | Robin Lee Nagano | Yoshiki Mikami
Proceedings of the 6th Workshop on Asian Language Resources

pdf bib
The Link Structure of Language Communities and its Implication for Language-specific Crawling
Rizza Caminero | Yoshiki Mikami
Proceedings of the 6th Workshop on Asian Language Resources

2005

pdf bib abs
Language and Encoding Scheme Identification of Extremely Large Sets of Multilingual Text
Pavol Zavarsky | Yoshiki Mikami | Shota Wada
Proceedings of Machine Translation Summit X: Posters

In the paper we present an outline of our approach to identify languages and encoding schemes in extremely large sets of multi-lingual documents. The large sets we are analyzing in our Language Observatory project [1] are formed by dozens of millions of text documents. In the paper we present an approach which allows us to analyze about 250 documents every second (about 20 million documents/day) on a single Linux machine. Using a multithread processing on a cluster of Linux servers we are able to analyze easily more than 100 million documents/day.

Co-authors

Robin Lee Nagano 1

Tomoe Takahashi 1

Pavol Zavarsky 1

Venues