James Allan

UMass Amherst

Also published as: J. Allan

Other people with similar names: James Allen (Rochester)


2019

pdf
A Multi-Task Architecture on Relevance-based Neural Query Translation
Sheikh Muhammad Sarwar | Hamed Bonab | James Allan
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics

We describe a multi-task learning approach to train a Neural Machine Translation (NMT) model with a Relevance-based Auxiliary Task (RAT) for search query translation. The translation process for Cross-lingual Information Retrieval (CLIR) task is usually treated as a black box and it is performed as an independent step. However, an NMT model trained on sentence-level parallel data is not aware of the vocabulary distribution of the retrieval corpus. We address this problem and propose a multi-task learning architecture that achieves 16% improvement over a strong baseline on Italian-English query-document dataset. We show using both quantitative and qualitative analysis that our model generates balanced and precise translations with the regularization effect it achieves from multi-task learning paradigm.

pdf
FEVER Breaker’s Run of Team NbAuzDrLqg
Youngwoo Kim | James Allan
Proceedings of the Second Workshop on Fact Extraction and VERification (FEVER)

We describe our submission for the Breaker phase of the second Fact Extraction and VERification (FEVER) Shared Task. Our adversarial data can be explained by two perspectives. First, we aimed at testing model’s ability to retrieve evidence, when appropriate query terms could not be easily generated from the claim. Second, we test model’s ability to precisely understand the implications of the texts, which we expect to be rare in FEVER 1.0 dataset. Overall, we suggested six types of adversarial attacks. The evaluation on the submitted systems showed that the systems were only able get both the evidence and label correct in 20% of the data. We also demonstrate our adversarial run analysis in the data development process.

2017

pdf
Improving Document Clustering by Removing Unnatural Language
Myungha Jang | Jinho D. Choi | James Allan
Proceedings of the 3rd Workshop on Noisy User-generated Text

Technical documents contain a fair amount of unnatural language, such as tables, formulas, and pseudo-code. Unnatural language can bean important factor of confusing existing NLP tools. This paper presents an effective method of distinguishing unnatural language from natural language, and evaluates the impact of un-natural language detection on NLP tasks such as document clustering. We view this problem as an information extraction task and build a multiclass classification model identifying unnatural language components into four categories. First, we create a new annotated corpus by collecting slides and papers in various for-mats, PPT, PDF, and HTML, where unnatural language components are annotated into four categories. We then explore features available from plain text to build a statistical model that can handle any format as long as it is converted into plain text. Our experiments show that re-moving unnatural language components gives an absolute improvement in document cluster-ing by up to 15%. Our corpus and tool are publicly available

2008

pdf bib
Proceedings of ACL-08: HLT
Johanna D. Moore | Simone Teufel | James Allan | Sadaoki Furui
Proceedings of ACL-08: HLT

pdf bib
Proceedings of ACL-08: HLT, Short Papers
Johanna D. Moore | Simone Teufel | James Allan | Sadaoki Furui
Proceedings of ACL-08: HLT, Short Papers

2007

pdf
Information Retrieval On Empty Fields
Victor Lavrenko | Xing Yi | James Allan
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

pdf
A Case For Shorter Queries, and Helping Users Create Them
Giridhar Kumaran | James Allan
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

pdf
Question Answering Using Integrated Information Retrieval and Information Extraction
Barry Schiffman | Kathleen McKeown | Ralph Grishman | James Allan
Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference

pdf bib
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Tutorial Abstracts
Marti Hearst | Gina-Anne Levow | James Allan
Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Tutorial Abstracts

2005

pdf
Using Names and Topics for New Event Detection
Giridhar Kumaran | James Allan
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

pdf
Matching Inconsistently Spelled Names in Automatic Speech Recognizer Output for Information Retrieval
Hema Raghavan | James Allan
Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language Processing

2004

pdf
Using Soundex Codes for Indexing Names in ASR Documents
Hema Raghavan | James Allan
Proceedings of the Workshop on Interdisciplinary Approaches to Speech Indexing and Retrieval at HLT-NAACL 2004

pdf bib
Cross-Document Coreference on a Large Scale Corpus
Chung Heong Gooi | James Allan
Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004

2001

pdf
An Evaluation Corpus For Temporal Summarization
Vikash Khandelwal | Rahul Gupta | James Allan
Proceedings of the First International Conference on Human Language Technology Research

pdf
Monitoring the News: a TDT demonstration system
David Frey | Rahul Gupta | Vikas Khandelwal | Victor Lavrenko | Anton Leuski | James Allan
Proceedings of the First International Conference on Human Language Technology Research

1993

pdf
The SMART Information Retrieval Project
C. Buckley | G. Salton | J. Allan
Human Language Technology: Proceedings of a Workshop Held at Plainsboro, New Jersey, March 21-24, 1993