This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
AbdelhakKelious
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
Cet article explore des méthodes permettant de prédire automatiquement la complexité lexicale dans un contexte multilingue à l’aide de modèles avancés de traitement automatique du langage naturel. Plus précisément, il étudie l’utilisation de l’apprentissage par transfert et des techniques d’augmentation de données dans un cadre d’apprentissage supervisé, mettant en lumière l’intérêt notable des approches multilingues. Nous évaluons également le potentiel des grands modèles de langage génératifs pour la prédiction de la complexité lexicale. À travers différentes stratégies de requêtage (zero-shot, one-shot et prompts avec raisonnement en chaîne), nous analysons les performances des modèles dans plusieurs langues. Nos résultats montrent que, bien que les modèles génératifs obtiennent des performances prometteuses, leur qualité prédictive reste variable, et les modèles optimisés pour une tâche spécifique continuent de les surpasser lorsqu’ils disposent de données d’entraînement suffisantes.
In this work, we explore the prediction of lexical complexity by combining supervised approaches and the use of large language models (LLMs). We first evaluate the impact of different prompting strategies (zero-shot, one-shot, and chain-of-thought) on the quality of the predictions, comparing the results with human annotations from the CompLex 2.0 corpus. Our results indicate that LLMs, and in particular gpt-4o, benefit from explicit instructions to better approximate human judgments, although some discrepancies remain. Moreover, a calibration approach to better align LLMs predictions and human judgements based on few manually annotated data appears as a promising solution to improve the reliability of the annotations in a supervised scenario.
Cette étude s’intéresse à la prédiction de la complexité lexicale. Nous explorons des méthodesd’apprentissage profond afin d’évaluer la complexité d’un mot en se basant sur son contexte. Plusspécifiquement, nous examinons comment utiliser des modèles de langue pré-entraînés pour encoderle mot cible et son contexte, en les combinant avec des caractéristiques supplémentaires basées sur lafréquence. Notre approche obtient de meilleurs résultats que les meilleurs systèmes de SemEval-2021(Shardlow et al., 2021). Enfin, nous menons une étude comparative avec ChatGPT afin d’évaluer sonpotentiel pour prédire la complexité lexicale en comparaison avec un modèle dédié à cette tâche.
There are several works in natural language processing for identifying lexical complexity. This can be for various reasons, either for simplification, the selection of more suitable content, or for other specific tasks. Words can have multiple definitions and degrees of complexity depending on the context in which they appear. One solution being investigated is lexical complexity prediction, where computational methods are used to evaluate the difficulty of vocabulary for language learners and offer personalized assistance. In this work, we explore deep learning methods to assess the complexity of a word based on its context. Specifically, we investigate how to use pre-trained language models to encode both the sentence and the target word, and then fine-tune them by combining them with additional frequency-based features. Our approach achieved superior results compared to the best systems in SemEval-2021 (Shardlow et al., 2021), as demonstrated by an R2 score of 0.65. Finally, we carry out a comparative study with ChatGPT to assess its potential for predicting lexical complexity, to see whether prompt engineering can be an alternative to this task, we will discuss the advantages and limitations of ChatGPT.
This study introduces a dedicated model aimed at solving the BRAINTEASER task 9 , a novel challenge designed to assess models’ lateral thinking capabilities through sentence and word puzzles. Our model demonstrates remarkable efficacy, securing Rank 1 in sentence puzzle solving during the test phase with an overall score of 0.98. Additionally, we explore the comparative performance of ChatGPT, specifically analyzing how variations in temperature settings affect its ability to engage in lateral thinking and problem-solving. Our findings indicate a notable performance disparity between the dedicated model and ChatGPT, underscoring the potential of specialized approaches in enhancing creative reasoning in AI.