Roger G. Garside


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


1994

pdf bib
The use of approximate string matching techniques in the alignment of sentences in parallel corpora
Anthony M. McEnery | Michael P. Oakes | Roger G. Garside
Proceedings of the Second International Conference on Machine Translation: Ten years on

Parallel corpora such as the Canadian Hansard corpus and the International Telecommunications Union (ITU) corpus each provide the same text in two or more languages, and have been aptly described as the "Rosetta Stone" of modern corpus linguistics [1]. Their use within MT is burgeoning, permeating all levels of the discipline, and even being used as the basis of full-blown statistically based MT systems. This paper will concern itself with the task of automatic bilingual lexicon construction, which is one of the major goals of the CRATER project (“Corpus Resources and Terminology Extraction”, funded under the MLAP initiative of the CEC, grant number MLAP-93/20). The approach to bilingual lexicon alignment taken here entails the alignment of corpora, and then a detailed search through the corpus for lexical cognates. Consequently the paper will begin with a brief discussion of the alignment procedures used on the project to date, and move to a discussion of various similarity metrics used to evaluate lexical similarity.