Julien Plu


2023

pdf
D2KLab at SemEval-2023 Task 2: Leveraging T-NER to Develop a Fine-Tuned Multilingual Model for Complex Named Entity Recognition
Thibault Ehrhart | Julien Plu | Raphael Troncy
Proceedings of the 17th International Workshop on Semantic Evaluation (SemEval-2023)

This paper presents D2KLab’s system used for the shared task of “Multilingual Complex Named Entity Recognition (MultiCoNER II)”, as part of SemEval 2023 Task 2. The system relies on a fine-tuned transformer based language model for extracting named entities. In addition to the architecture of the system, we discuss our results and observations.

2021

pdf
Datasets: A Community Library for Natural Language Processing
Quentin Lhoest | Albert Villanova del Moral | Yacine Jernite | Abhishek Thakur | Patrick von Platen | Suraj Patil | Julien Chaumond | Mariama Drame | Julien Plu | Lewis Tunstall | Joe Davison | Mario Šaško | Gunjan Chhablani | Bhavitvya Malik | Simon Brandeis | Teven Le Scao | Victor Sanh | Canwen Xu | Nicolas Patry | Angelina McMillan-Major | Philipp Schmid | Sylvain Gugger | Clément Delangue | Théo Matussière | Lysandre Debut | Stas Bekman | Pierric Cistac | Thibault Goehringer | Victor Mustar | François Lagunas | Alexander Rush | Thomas Wolf
Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

The scale, variety, and quantity of publicly-available NLP datasets has grown rapidly as researchers propose new tasks, larger models, and novel benchmarks. Datasets is a community library for contemporary NLP designed to support this ecosystem. Datasets aims to standardize end-user interfaces, versioning, and documentation, while providing a lightweight front-end that behaves similarly for small datasets as for internet-scale corpora. The design of the library incorporates a distributed, community-driven approach to adding datasets and documenting usage. After a year of development, the library now includes more than 650 unique datasets, has more than 250 contributors, and has helped support a variety of novel cross-dataset research projects and shared tasks. The library is available at https://github.com/huggingface/datasets.

2020

pdf
Transformers: State-of-the-Art Natural Language Processing
Thomas Wolf | Lysandre Debut | Victor Sanh | Julien Chaumond | Clement Delangue | Anthony Moi | Pierric Cistac | Tim Rault | Remi Louf | Morgan Funtowicz | Joe Davison | Sam Shleifer | Patrick von Platen | Clara Ma | Yacine Jernite | Julien Plu | Canwen Xu | Teven Le Scao | Sylvain Gugger | Mariama Drame | Quentin Lhoest | Alexander Rush
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations

Recent progress in natural language processing has been driven by advances in both model architecture and model pretraining. Transformer architectures have facilitated building higher-capacity models and pretraining has made it possible to effectively utilize this capacity for a wide variety of tasks. Transformers is an open-source library with the goal of opening up these advances to the wider machine learning community. The library consists of carefully engineered state-of-the art Transformer architectures under a unified API. Backing this library is a curated collection of pretrained models made by and available for the community. Transformers is designed to be extensible by researchers, simple for practitioners, and fast and robust in industrial deployments. The library is available at https://github.com/huggingface/transformers.

2018

pdf
Sanaphor++: Combining Deep Neural Networks with Semantics for Coreference Resolution
Julien Plu | Roman Prokofyev | Alberto Tonon | Philippe Cudré-Mauroux | Djellel Eddine Difallah | Raphaël Troncy | Giuseppe Rizzo
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)

pdf
JeuxDeLiens: Word Embeddings and Path-Based Similarity for Entity Linking using the French JeuxDeMots Lexical Semantic Network
Julien Plu | Kevin Cousot | Mathieu Lafourcade | Raphaël Troncy | Giuseppe Rizzo
Actes de la Conférence TALN. Volume 1 - Articles longs, articles courts de TALN

Entity linking systems typically rely on encyclopedic knowledge bases such as DBpedia or Freebase. In this paper, we use, instead, a French lexical-semantic network named JeuxDeMots to jointly type and link entities. Our approach combines word embeddings and a path-based similarity resulting in encouraging results over a set of documents from the French Le Monde newspaper.

2016

pdf
Context-enhanced Adaptive Entity Linking
Filip Ilievski | Giuseppe Rizzo | Marieke van Erp | Julien Plu | Raphaël Troncy
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

More and more knowledge bases are publicly available as linked data. Since these knowledge bases contain structured descriptions of real-world entities, they can be exploited by entity linking systems that anchor entity mentions from text to the most relevant resources describing those entities. In this paper, we investigate adaptation of the entity linking task using contextual knowledge. The key intuition is that entity linking can be customized depending on the textual content, as well as on the application that would make use of the extracted information. We present an adaptive approach that relies on contextual knowledge from text to enhance the performance of ADEL, a hybrid linguistic and graph-based entity linking system. We evaluate our approach on a domain-specific corpus consisting of annotated WikiNews articles.

pdf
Evaluating Entity Linking: An Analysis of Current Benchmark Datasets and a Roadmap for Doing a Better Job
Marieke van Erp | Pablo Mendes | Heiko Paulheim | Filip Ilievski | Julien Plu | Giuseppe Rizzo | Joerg Waitelonis
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Entity linking has become a popular task in both natural language processing and semantic web communities. However, we find that the benchmark datasets for entity linking tasks do not accurately evaluate entity linking systems. In this paper, we aim to chart the strengths and weaknesses of current benchmark datasets and sketch a roadmap for the community to devise better benchmark datasets.