Aric Bills


2020

pdf bib
Corpora for Cross-Language Information Retrieval in Six Less-Resourced Languages
Ilya Zavorin | Aric Bills | Cassian Corey | Michelle Morrison | Audrey Tong | Richard Tong
Proceedings of the workshop on Cross-Language Search and Summarization of Text and Speech (CLSSTS2020)

The Machine Translation for English Retrieval of Information in Any Language (MATERIAL) research program, sponsored by the Intelligence Advanced Research Projects Activity (IARPA), focuses on rapid development of end-to-end systems capable of retrieving foreign language speech and text documents relevant to different types of English queries that may be further restricted by domain. Those systems also provide evidence of relevance of the retrieved content in the form of English summaries. The program focuses on Less-Resourced Languages and provides its performer teams very limited amounts of annotated training data. This paper describes the corpora that were created for system development and evaluation for the six languages released by the program to date: Tagalog, Swahili, Somali, Lithuanian, Bulgarian and Pashto. The corpora include build packs to train Machine Translation and Automatic Speech Recognition systems; document sets in three text and three speech genres annotated for domain and partitioned for analysis, development and evaluation; and queries of several types together with corresponding binary relevance judgments against the entire set of documents. The paper also describes a detection metric called Actual Query Weighted Value developed by the program to evaluate end-to-end system performance.

2017

pdf
Endangered Data for Endangered Languages: Digitizing Print dictionaries
Michael Maxwell | Aric Bills
Proceedings of the 2nd Workshop on the Use of Computational Methods in the Study of Endangered Languages

2014

pdf
ArCADE: An Arabic Corpus of Auditory Dictation Errors
C. Anton Rytting | Paul Rodrigues | Tim Buckwalter | Valerie Novak | Aric Bills | Noah H. Silbert | Mohini Madgavkar
Proceedings of the Ninth Workshop on Innovative Use of NLP for Building Educational Applications