Marco Lui


2014

pdf
Automatic Detection and Language Identification of Multilingual Documents
Marco Lui | Jey Han Lau | Timothy Baldwin
Transactions of the Association for Computational Linguistics, Volume 2

Language identification is the task of automatically detecting the language(s) present in a document based on the content of the document. In this work, we address the problem of detecting documents that contain text from more than one language (multilingual documents). We introduce a method that is able to detect that a document is multilingual, identify the languages present, and estimate their relative proportions. We demonstrate the effectiveness of our method over synthetic data, as well as real-world multilingual documents collected from the web.

pdf
Accurate Language Identification of Twitter Messages
Marco Lui | Timothy Baldwin
Proceedings of the 5th Workshop on Language Analysis for Social Media (LASM)

pdf
Exploring Methods and Resources for Discriminating Similar Languages
Marco Lui | Ned Letcher | Oliver Adams | Long Duong | Paul Cook | Timothy Baldwin
Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects

2013

pdf
How Noisy Social Media Text, How Diffrnt Social Media Sources?
Timothy Baldwin | Paul Cook | Marco Lui | Andrew MacKinlay | Li Wang
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf
Classifying English Documents by National Dialect
Marco Lui | Paul Cook
Proceedings of the Australasian Language Technology Association Workshop 2013 (ALTA 2013)

pdf
Recovering Casing and Punctuation using Conditional Random Fields
Marco Lui | Li Wang
Proceedings of the Australasian Language Technology Association Workshop 2013 (ALTA 2013)

pdf
UniMelb_NLP-CORE: Integrating predictions from multiple domains and feature sets for estimating semantic textual similarity
Spandana Gella | Bahar Salehi | Marco Lui | Karl Grieser | Paul Cook | Timothy Baldwin
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity

2012

pdf
Unsupervised Estimation of Word Usage Similarity
Marco Lui | Timothy Baldwin | Diana McCarthy
Proceedings of the Australasian Language Technology Association Workshop 2012

pdf
langid.py for better language modelling
Paul Cook | Marco Lui
Proceedings of the Australasian Language Technology Association Workshop 2012

pdf
Feature Stacking for Sentence Classification in Evidence-Based Medicine
Marco Lui
Proceedings of the Australasian Language Technology Association Workshop 2012

pdf
langid.py: An Off-the-shelf Language Identification Tool
Marco Lui | Timothy Baldwin
Proceedings of the ACL 2012 System Demonstrations

2011

pdf
Cross-domain Feature Selection for Language Identification
Marco Lui | Timothy Baldwin
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf
Predicting Thread Discourse Structure over Technical Web Forums
Li Wang | Marco Lui | Su Nam Kim | Joakim Nivre | Timothy Baldwin
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

pdf
Multilingual Language Identification: ALTW 2010 Shared Task Data
Timothy Baldwin | Marco Lui
Proceedings of the Australasian Language Technology Association Workshop 2010

pdf
Classifying User Forum Participants: Separating the Gurus from the Hacks, and Other Tales of the Internet
Marco Lui | Timothy Baldwin
Proceedings of the Australasian Language Technology Association Workshop 2010

pdf
Intelligent Linux Information Access by Data Mining: the ILIAD Project
Timothy Baldwin | David Martinez | Richard Penman | Su Nam Kim | Marco Lui | Li Wang | Andrew MacKinlay
Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics in a World of Social Media

pdf
Language Identification: The Long and the Short of the Matter
Timothy Baldwin | Marco Lui
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics