David Zajic


A Random Forest System Combination Approach for Error Detection in Digital Dictionaries
Michael Bloodgood | Peng Ye | Paul Rodrigues | David Zajic | David Doermann
Proceedings of the Workshop on Innovative Hybrid Approaches to the Processing of Textual Data


Error Correction for Arabic Dictionary Lookup
C. Anton Rytting | Paul Rodrigues | Tim Buckwalter | David Zajic | Bridget Hirsch | Jeff Carnes | Nathanael Lynn | Sarah Wayland | Chris Taylor | Jason White | Charles Blake III | Evelyn Browne | Corey Miller | Tristan Purvis
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

We describe a new Arabic spelling correction system which is intended for use with electronic dictionary search by learners of Arabic. Unlike other spelling correction systems, this system does not depend on a corpus of attested student errors but on student- and teacher-generated ratings of confusable pairs of phonemes or letters. Separate error modules for keyboard mistypings, phonetic confusions, and dialectal confusions are combined to create a weighted finite-state transducer that calculates the likelihood that an input string could correspond to each citation form in a dictionary of Iraqi Arabic. Results are ranked by the estimated likelihood that a citation form could be misheard, mistyped, or mistranscribed for the input given by the user. To evaluate the system, we developed a noisy-channel model trained on studentsÂ’ speech errors and use it to perturb citation forms from a dictionary. We compare our system to a baseline based on Levenshtein distance and find that, when evaluated on single-error queries, our system performs 28% better than the baseline (overall MRR) and is twice as good at returning the correct dictionary form as the top-ranked result. We believe this to be the first spelling correction system designed for a spoken, colloquial dialect of Arabic.


Using Citations to Generate surveys of Scientific Paradigms
Saif Mohammad | Bonnie Dorr | Melissa Egan | Ahmed Hassan | Pradeep Muthukrishan | Vahed Qazvinian | Dragomir Radev | David Zajic
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics


pdf bib
A Methodology for Extrinsic Evaluation of Text Summarization: Does ROUGE Correlate?
Bonnie Dorr | Christof Monz | Stacy President | Richard Schwartz | David Zajic
Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization


pdf bib
Hedge Trimmer: A Parse-and-Trim Approach to Headline Generation
Bonnie Dorr | David Zajic | Richard Schwartz
Proceedings of the HLT-NAACL 03 Text Summarization Workshop