2016
pdf
Calculating the percentage reduction in translator effort when using machine translation
Andrzej Zydrón
|
Qun Liu
Proceedings of Translating and the Computer 38
pdf
bib
abs
Using BabelNet to Improve OOV Coverage in SMT
Jinhua Du
|
Andy Way
|
Andrzej Zydron
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)
Out-of-vocabulary words (OOVs) are a ubiquitous and difficult problem in statistical machine translation (SMT). This paper studies different strategies of using BabelNet to alleviate the negative impact brought about by OOVs. BabelNet is a multilingual encyclopedic dictionary and a semantic network, which not only includes lexicographic and encyclopedic terms, but connects concepts and named entities in a very large network of semantic relations. By taking advantage of the knowledge in BabelNet, three different methods ― using direct training data, domain-adaptation techniques and the BabelNet API ― are proposed in this paper to obtain translations for OOVs to improve system performance. Experimental results on English―Polish and English―Chinese language pairs show that domain adaptation can better utilize BabelNet knowledge and performs better than other methods. The results also demonstrate that BabelNet is a really useful tool for improving translation performance of SMT systems.
2015
pdf
FALCON: Building the localization web
Andrzej Zydroń
Proceedings of Translating and the Computer 37
pdf
Neocortical computing: Next generation machine translation
Andrzej Zydroń
Proceedings of Translating and the Computer 37
2014
pdf
The dos and don’ts of XML document localization
Andrzej Zydroń
Proceedings of Translating and the Computer 36
2013
pdf
bib
Using Excel as an XLIFF editor: You cannot be serious!
Andrzej Zydroń
Proceedings of Translating and the Computer 35
2012
pdf
abs
On Using Linked Data for Language Resource Sharing in the Long Tail of the Localisation Market
David Lewis
|
Alexander O’Connor
|
Andrzej Zydroń
|
Gerd Sjögren
|
Rahzeb Choudhury
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)
Innovations in localisation have focused on the collection and leverage of language resources. However, smaller localisation clients and Language Service Providers are poorly positioned to exploit the benefits of language resource reuse in comparison to larger companies. Their low throughput of localised content means they have little opportunity to amass significant resources, such as Translation memories and Terminology databases, to reuse between jobs or to train statistical machine translation engines tailored to their domain specialisms and language pairs. We propose addressing this disadvantage via the sharing and pooling of language resources. However, the current localisation standards do not support multiparty sharing, are not well integrated with emerging language resource standards and do not address key requirements in determining ownership and license terms for resources. We survey standards and research in the area of Localisation, Language Resources and Language Technologies to leverage existing localisation standards via Linked Data methodologies. This points to the potential of using semantic representation of existing data models for localisation workflow metadata, terminology, parallel text, provenance and access control, which we illustrate with an RDF example.
2003
pdf
xml:tm - Using XML technology to reduce the cost of authoring and translation
Andrzej Zydron
Proceedings of Translating and the Computer 25