Théo Salmenkivi-Friberg


2025

pdf bib
Lingonberry Giraffe: Lexically-Sound Beam Search for Explainable Translation of Compound Words
Théo Salmenkivi-Friberg | Iikka Hauhio
Proceedings of Machine Translation Summit XX: Volume 1

We present a hybrid rule-based and neural method for translating Finnish compound words into English. We use a lightweight set of rules to split a Finnish word into its constituent parts and determine the possible translations of those words using a dictionary. We then use an NMT model to rank these alternatives to determine the final output. Since the number of translations that takes into account different spellings, inflections, and word separators can be very large, we use beam search for the ranking when the number of translations is over a threshold. We find that our method is an improvement over using the same NMT model for end-to-end translation in both automatic and human evaluation. We conclude that our method retains the good qualities of rule-based translation such as explainability and controllability while keeping the rules lightweight.