Edit Aware Representation Learning via Levenshtein Prediction

Edison Marrese-Taylor, Machel Reid, Alfredo Solano


Abstract
We propose a novel approach that employs token-level Levenshtein operations to learn a continuous latent space of vector representations to capture the underlying semantic information with regard to the document editing process. Though our model outperforms strong baselines when fine-tuned on edit-centric tasks, it is unclear if these results are due to domain similarities between fine-tuning and pre-training data, suggesting that the benefits of our proposed approach over regular masked language-modelling pre-training are limited.
Anthology ID:
2023.insights-1.6
Volume:
The Fourth Workshop on Insights from Negative Results in NLP
Month:
May
Year:
2023
Address:
Dubrovnik, Croatia
Editors:
Shabnam Tafreshi, Arjun Akula, João Sedoc, Aleksandr Drozd, Anna Rogers, Anna Rumshisky
Venue:
insights
SIG:
Publisher:
Association for Computational Linguistics
Note:
Pages:
53–58
Language:
URL:
https://aclanthology.org/2023.insights-1.6
DOI:
10.18653/v1/2023.insights-1.6
Bibkey:
Cite (ACL):
Edison Marrese-Taylor, Machel Reid, and Alfredo Solano. 2023. Edit Aware Representation Learning via Levenshtein Prediction. In The Fourth Workshop on Insights from Negative Results in NLP, pages 53–58, Dubrovnik, Croatia. Association for Computational Linguistics.
Cite (Informal):
Edit Aware Representation Learning via Levenshtein Prediction (Marrese-Taylor et al., insights 2023)
Copy Citation:
PDF:
https://preview.aclanthology.org/finnlp-2volume-ingestion/2023.insights-1.6.pdf
Video:
 https://preview.aclanthology.org/finnlp-2volume-ingestion/2023.insights-1.6.mp4