David D. Lewis

Also published as: David Lewis


2016

An assessment of the intellectual property requirements for data used in machine-aided translation is provided based on a recent EC-funded legal review. This is compared against the capabilities offered by current linked open data standards from the W3C for publishing and sharing translation memories from translation projects, and proposals for adequately addressing the intellectual property needs of stakeholders in translation projects using open data vocabularies are suggested.

2015

2014

As language resources start to become available in linked data formats, it becomes relevant to consider how linked data interoperability can play a role in active language processing workflows as well as for more static language resource publishing. This paper proposes that linked data may have a valuable role to play in tracking the use and generation of language resources in such workflows in order to assess and improve the performance of the language technologies that use the resources, based on feedback from the human involvement typically required within such processes. We refer to this as Active Curation of the language resources, since it is performed systematically over language processing workflows to continuously improve the quality of the resource in specific applications, rather than via dedicated curation steps. We use modern localisation workflows, i.e. assisted by machine translation and text analytics services, to explain how linked data can support such active curation. By referencing how a suitable linked data vocabulary can be assembled by combining existing linked data vocabularies and meta-data from other multilingual content processing annotations and tool exchange standards we aim to demonstrate the relative ease with which active curation can be deployed more broadly.

2012

Innovations in localisation have focused on the collection and leverage of language resources. However, smaller localisation clients and Language Service Providers are poorly positioned to exploit the benefits of language resource reuse in comparison to larger companies. Their low throughput of localised content means they have little opportunity to amass significant resources, such as Translation memories and Terminology databases, to reuse between jobs or to train statistical machine translation engines tailored to their domain specialisms and language pairs. We propose addressing this disadvantage via the sharing and pooling of language resources. However, the current localisation standards do not support multiparty sharing, are not well integrated with emerging language resource standards and do not address key requirements in determining ownership and license terms for resources. We survey standards and research in the area of Localisation, Language Resources and Language Technologies to leverage existing localisation standards via Linked Data methodologies. This points to the potential of using semantic representation of existing data models for localisation workflow metadata, terminology, parallel text, provenance and access control, which we illustrate with an RDF example.

2009

1994

1993

1992

1991

1990