David Moeljadi


2020

pdf bib
Building the Old Javanese Wordnet
David Moeljadi | Zakariya Pamuji Aminullah
Proceedings of the 12th Language Resources and Evaluation Conference

This paper discusses the construction and the ongoing development of the Old Javanese Wordnet. The words were extracted from the digitized version of the Old Javanese–English Dictionary (Zoetmulder, 1982). The wordnet is built using the ‘expansion’ approach (Vossen, 1998), leveraging on the Princeton Wordnet’s core synsets and semantic hierarchy, as well as scientific names. The main goal of our project was to produce a high quality, human-curated resource. As of December 2019, the Old Javanese Wordnet contains 2,054 concepts or synsets and 5,911 senses. It is released under a Creative Commons Attribution 4.0 International License (CC BY 4.0). We are still developing it and adding more synsets and senses. We believe that the lexical data made available by this wordnet will be useful for a variety of future uses such as the development of Modern Javanese Wordnet and many language processing tasks and linguistic research on Javanese.

2016

pdf bib
Sentiment Analysis for Low Resource Languages: A Study on Informal Indonesian Tweets
Tuan Anh Le | David Moeljadi | Yasuhide Miura | Tomoko Ohkuma
Proceedings of the 12th Workshop on Asian Language Resources (ALR12)

This paper describes our attempt to build a sentiment analysis system for Indonesian tweets. With this system, we can study and identify sentiments and opinions in a text or document computationally. We used four thousand manually labeled tweets collected in February and March 2016 to build the model. Because of the variety of content in tweets, we analyze tweets into eight groups in total, including pos(itive), neg(ative), and neu(tral). Finally, we obtained 73.2% accuracy with Long Short Term Memory (LSTM) without normalizer.

pdf bib
Identifying and Exploiting Definitions in Wordnet Bahasa
David Moeljadi | Francis Bond
Proceedings of the 8th Global WordNet Conference (GWC)

This paper describes our attempts to add Indonesian definitions to synsets in the Wordnet Bahasa (Nurril Hirfana Mohamed Noor et al., 2011; Bond et al., 2014), to extract semantic relations between lemmas and definitions for nouns and verbs, such as synonym, hyponym, hypernym and instance hypernym, and to generally improve Wordnet. The original, somewhat noisy, definitions for Indonesian came from the Asian Wordnet project (Riza et al., 2010). The basic method of extracting the relations is based on Bond et al. (2004). Before the relations can be extracted, the definitions were cleaned up and tokenized. We found that the definitions cannot be completely cleaned up because of many misspellings and bad translations. However, we could identify four semantic relations in 57.10% of noun and verb definitions. For the remaining 42.90%, we propose to add 149 new Indonesian lemmas and make some improvements to Wordnet Bahasa and Wordnet in general.

2015

pdf bib
Building an HPSG-based Indonesian Resource Grammar (INDRA)
David Moeljadi | Francis Bond | Sanghoun Song
Proceedings of the Grammar Engineering Across Frameworks (GEAF) 2015 Workshop