Marco Lui
2014
Automatic Detection and Language Identification of Multilingual Documents
Marco Lui | Jey Han Lau | Timothy Baldwin
Transactions of the Association for Computational Linguistics, Volume 2
Marco Lui | Jey Han Lau | Timothy Baldwin
Transactions of the Association for Computational Linguistics, Volume 2
Language identification is the task of automatically detecting the language(s) present in a document based on the content of the document. In this work, we address the problem of detecting documents that contain text from more than one language (multilingual documents). We introduce a method that is able to detect that a document is multilingual, identify the languages present, and estimate their relative proportions. We demonstrate the effectiveness of our method over synthetic data, as well as real-world multilingual documents collected from the web.
Accurate Language Identification of Twitter Messages
Marco Lui | Timothy Baldwin
Proceedings of the 5th Workshop on Language Analysis for Social Media (LASM)
Marco Lui | Timothy Baldwin
Proceedings of the 5th Workshop on Language Analysis for Social Media (LASM)
Exploring Methods and Resources for Discriminating Similar Languages
Marco Lui | Ned Letcher | Oliver Adams | Long Duong | Paul Cook | Timothy Baldwin
Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects
Marco Lui | Ned Letcher | Oliver Adams | Long Duong | Paul Cook | Timothy Baldwin
Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects
2013
How Noisy Social Media Text, How Diffrnt Social Media Sources?
Timothy Baldwin | Paul Cook | Marco Lui | Andrew MacKinlay | Li Wang
Proceedings of the Sixth International Joint Conference on Natural Language Processing
Timothy Baldwin | Paul Cook | Marco Lui | Andrew MacKinlay | Li Wang
Proceedings of the Sixth International Joint Conference on Natural Language Processing
UniMelb_NLP-CORE: Integrating predictions from multiple domains and feature sets for estimating semantic textual similarity
Spandana Gella | Bahar Salehi | Marco Lui | Karl Grieser | Paul Cook | Timothy Baldwin
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity
Spandana Gella | Bahar Salehi | Marco Lui | Karl Grieser | Paul Cook | Timothy Baldwin
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity
Classifying English Documents by National Dialect
Marco Lui | Paul Cook
Proceedings of the Australasian Language Technology Association Workshop 2013 (ALTA 2013)
Marco Lui | Paul Cook
Proceedings of the Australasian Language Technology Association Workshop 2013 (ALTA 2013)
Recovering Casing and Punctuation using Conditional Random Fields
Marco Lui | Li Wang
Proceedings of the Australasian Language Technology Association Workshop 2013 (ALTA 2013)
Marco Lui | Li Wang
Proceedings of the Australasian Language Technology Association Workshop 2013 (ALTA 2013)
2012
langid.py: An Off-the-shelf Language Identification Tool
Marco Lui | Timothy Baldwin
Proceedings of the ACL 2012 System Demonstrations
Marco Lui | Timothy Baldwin
Proceedings of the ACL 2012 System Demonstrations
Unsupervised Estimation of Word Usage Similarity
Marco Lui | Timothy Baldwin | Diana McCarthy
Proceedings of the Australasian Language Technology Association Workshop 2012
Marco Lui | Timothy Baldwin | Diana McCarthy
Proceedings of the Australasian Language Technology Association Workshop 2012
langid.py for better language modelling
Paul Cook | Marco Lui
Proceedings of the Australasian Language Technology Association Workshop 2012
Paul Cook | Marco Lui
Proceedings of the Australasian Language Technology Association Workshop 2012
Feature Stacking for Sentence Classification in Evidence-Based Medicine
Marco Lui
Proceedings of the Australasian Language Technology Association Workshop 2012
Marco Lui
Proceedings of the Australasian Language Technology Association Workshop 2012
2011
Predicting Thread Discourse Structure over Technical Web Forums
Li Wang | Marco Lui | Su Nam Kim | Joakim Nivre | Timothy Baldwin
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing
Li Wang | Marco Lui | Su Nam Kim | Joakim Nivre | Timothy Baldwin
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing
Cross-domain Feature Selection for Language Identification
Marco Lui | Timothy Baldwin
Proceedings of 5th International Joint Conference on Natural Language Processing
Marco Lui | Timothy Baldwin
Proceedings of 5th International Joint Conference on Natural Language Processing
2010
Language Identification: The Long and the Short of the Matter
Timothy Baldwin | Marco Lui
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Timothy Baldwin | Marco Lui
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics
Multilingual Language Identification: ALTW 2010 Shared Task Data
Timothy Baldwin | Marco Lui
Proceedings of the Australasian Language Technology Association Workshop 2010
Timothy Baldwin | Marco Lui
Proceedings of the Australasian Language Technology Association Workshop 2010