Marco Lui


2014

pdf bib
Accurate Language Identification of Twitter Messages
Marco Lui | Timothy Baldwin
Proceedings of the 5th Workshop on Language Analysis for Social Media (LASM)

pdf bib
Exploring Methods and Resources for Discriminating Similar Languages
Marco Lui | Ned Letcher | Oliver Adams | Long Duong | Paul Cook | Timothy Baldwin
Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects

pdf bib
Automatic Detection and Language Identification of Multilingual Documents
Marco Lui | Jey Han Lau | Timothy Baldwin
Transactions of the Association for Computational Linguistics, Volume 2

Language identification is the task of automatically detecting the language(s) present in a document based on the content of the document. In this work, we address the problem of detecting documents that contain text from more than one language (multilingual documents). We introduce a method that is able to detect that a document is multilingual, identify the languages present, and estimate their relative proportions. We demonstrate the effectiveness of our method over synthetic data, as well as real-world multilingual documents collected from the web.

2013

pdf bib
UniMelb_NLP-CORE: Integrating predictions from multiple domains and feature sets for estimating semantic textual similarity
Spandana Gella | Bahar Salehi | Marco Lui | Karl Grieser | Paul Cook | Timothy Baldwin
Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity

pdf bib
How Noisy Social Media Text, How Diffrnt Social Media Sources?
Timothy Baldwin | Paul Cook | Marco Lui | Andrew MacKinlay | Li Wang
Proceedings of the Sixth International Joint Conference on Natural Language Processing

pdf bib
Classifying English Documents by National Dialect
Marco Lui | Paul Cook
Proceedings of the Australasian Language Technology Association Workshop 2013 (ALTA 2013)

pdf bib
Recovering Casing and Punctuation using Conditional Random Fields
Marco Lui | Li Wang
Proceedings of the Australasian Language Technology Association Workshop 2013 (ALTA 2013)

2012

pdf bib
langid.py: An Off-the-shelf Language Identification Tool
Marco Lui | Timothy Baldwin
Proceedings of the ACL 2012 System Demonstrations

pdf bib
Unsupervised Estimation of Word Usage Similarity
Marco Lui | Timothy Baldwin | Diana McCarthy
Proceedings of the Australasian Language Technology Association Workshop 2012

pdf bib
langid.py for better language modelling
Paul Cook | Marco Lui
Proceedings of the Australasian Language Technology Association Workshop 2012

pdf bib
Feature Stacking for Sentence Classification in Evidence-Based Medicine
Marco Lui
Proceedings of the Australasian Language Technology Association Workshop 2012

2011

pdf bib
Cross-domain Feature Selection for Language Identification
Marco Lui | Timothy Baldwin
Proceedings of 5th International Joint Conference on Natural Language Processing

pdf bib
Predicting Thread Discourse Structure over Technical Web Forums
Li Wang | Marco Lui | Su Nam Kim | Joakim Nivre | Timothy Baldwin
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

pdf bib
Language Identification: The Long and the Short of the Matter
Timothy Baldwin | Marco Lui
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

pdf bib
Intelligent Linux Information Access by Data Mining: the ILIAD Project
Timothy Baldwin | David Martinez | Richard Penman | Su Nam Kim | Marco Lui | Li Wang | Andrew MacKinlay
Proceedings of the NAACL HLT 2010 Workshop on Computational Linguistics in a World of Social Media

pdf bib
Multilingual Language Identification: ALTW 2010 Shared Task Data
Timothy Baldwin | Marco Lui
Proceedings of the Australasian Language Technology Association Workshop 2010

pdf bib
Classifying User Forum Participants: Separating the Gurus from the Hacks, and Other Tales of the Internet
Marco Lui | Timothy Baldwin
Proceedings of the Australasian Language Technology Association Workshop 2010