Dirk De Hertog


Contextualized Usage-Based Material Selection
Dirk De Hertog | Piet Desmet
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

Deep Learning Architecture for Complex Word Identification
Dirk De Hertog | Anaïs Tack
Proceedings of the Thirteenth Workshop on Innovative Use of NLP for Building Educational Applications

We describe a system for the CWI-task that includes information on 5 aspects of the (complex) lexical item, namely distributional information of the item itself, morphological structure, psychological measures, corpus-counts and topical information. We constructed a deep learning architecture that combines those features and apply it to the probabilistic and binary classification task for all English sets and Spanish. We achieved reasonable performance on all sets with best performances seen on the probabilistic task, particularly on the English news set (MAE 0.054 and F1-score of 0.872). An analysis of the results shows that reasonable performance can be achieved with a single architecture without any domain-specific tweaking of the parameter settings and that distributional features capture almost all of the information also found in hand-crafted features.


TermWise: A CAT-tool with Context-Sensitive Terminological Support.
Kris Heylen | Stephen Bond | Dirk De Hertog | Ivan Vulić | Hendrik Kockaert
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Increasingly, large bilingual document collections are being made available online, especially in the legal domain. This type of Big Data is a valuable resource that specialized translators exploit to search for informative examples of how domain-specific expressions should be translated. However, general purpose search engines are not optimized to retrieve previous translations that are maximally relevant to a translator. In this paper, we report on the TermWise project, a cooperation of terminologists, corpus linguists and computer scientists, that aims to leverage big online translation data for terminological support to legal translators at the Belgian Federal Ministry of Justice. The project developed dedicated knowledge extraction algorithms and a server-based tool to provide translators with the most relevant previous translations of domain-specific expressions relative to the current translation assignment. The functionality is implemented an extra database, a Term&Phrase Memory, that is meant to be integrated with existing Computer Assisted Translation tools. In the paper, we give an overview of the system, give a demo of the user interface, we present a user-based evaluation by translators and discuss how the tool is part of the general evolution towards exploiting Big Data in translation.


Etude sémantique des mots-clés et des marqueurs lexicaux stables dans un corpus technique (Semantic Analysis of Keywords and Stable Lexical Markers in a Technical Corpus) [in French]
Ann Bertels | Dirk De Hertog | Kris Heylen
Proceedings of the Joint Conference JEP-TALN-RECITAL 2012, volume 2: TALN