Integrating the Management of Personal Data Protection and Open Science with Research Ethics
Dave Lewis | Joss Moorkens | Kaniz Fatema
Proceedings of the First ACL Workshop on Ethics in Natural Language Processing
We examine the impact of the EU General Data Protection Regulation and the push from research funders to provide open access research data on the current practices in Language Technology Research. We analyse the challenges that arise and the opportunities to address many of them through the use of existing open data practices. We discuss the impact of this also on current practice in research ethics.
Though a number of web-based CAT tools have emerged over recent years, to date the most common form of CAT tool used by translators remains the desktop-based CAT tool. However, currently none of the most commonly used desktop-based CAT tools provide a means of measuring translation speed at a segment level. This metric is important, as previous work on MT productivity testing has shown that edit distance can be a misleading measure of MT post-editing effort. In this paper we present iOmegaT, an instrumented version of a popular desktop-based open-source CAT tool called OmegaT. We survey a number of similar applications and outline some of the weaknesses of web-based CAT tools for experi- enced professional translators. On the basis of a two productivity test carried out using iOmegaT we show why it is important to be able to identify fast good post-editors to maximize MT utility and how this is problematic using only edit-distance measures. Finally, we argue how and why instrumentation could be added to more commonly used desktop-based CAT tools that are paid for by freelance translators if their privacy is respected.