Frederic Mailhot
Also published as: Frédéric Mailhot, Fred Mailhot
2026
How much capacity does Turkish inflection require? An empirical study of GRU encoder–decoder bottlenecks.
Fred Mailhot
Proceedings of the Society for Computation in Linguistics 2026
Fred Mailhot
Proceedings of the Society for Computation in Linguistics 2026
Encoder–decoder neural networks with high-dimensional (e.g. d=300-–500) embeddings and hidden layers can be used to model a variety of morphophonological phenomena as sequence-to-sequence mappings, achieving high accuracy across languages and patterns. We show here that these high-capacity models are overparameterized, at least for the task of morphological inflection, and that simpler and smaller networks can perform near ceiling on the task of inflecting Turkish stems. Moreover these reduced-capacity models encode linguistically relevant information even when they are too small to succeed at the inflectional task.
2025
Proceedings of the 22nd SIGMORPHON workshop on Computational Morphology, Phonology, and Phonetics
Garrett Nicolai | Eleanor Chodroff | Frederic Mailhot | Çağrı Çöltekin
Proceedings of the 22nd SIGMORPHON workshop on Computational Morphology, Phonology, and Phonetics
Garrett Nicolai | Eleanor Chodroff | Frederic Mailhot | Çağrı Çöltekin
Proceedings of the 22nd SIGMORPHON workshop on Computational Morphology, Phonology, and Phonetics
Proceedings of the Society for Computation in Linguistics 2025
Carolyn Jane Anderson | Frédéric Mailhot | Grusha Prasad
Proceedings of the Society for Computation in Linguistics 2025
Carolyn Jane Anderson | Frédéric Mailhot | Grusha Prasad
Proceedings of the Society for Computation in Linguistics 2025
2024
Acoustic barycenters as exemplar production targets
Frederic Mailhot | Cassandra L. Jacobs
Proceedings of the 21st SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology
Frederic Mailhot | Cassandra L. Jacobs
Proceedings of the 21st SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology
We present a solution to the problem of exemplar-based language production from variable-duration tokens, leveraging algorithms from the domain of time-series clustering and classification. Our model stores and outputs tokens of phonetically rich and temporally variable representations of recorded speech. We show qualitatively and quantitatively that model outputs retain essential acoustic/phonetic characteristics despite the noise introduced by averaging, and also demonstrate the effects of similarity and indexical information as constraints on exemplar cloud selection.
Proceedings of the 21st SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology
Garrett Nicolai | Eleanor Chodroff | Frederic Mailhot | Çağrı Çöltekin
Proceedings of the 21st SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology
Garrett Nicolai | Eleanor Chodroff | Frederic Mailhot | Çağrı Çöltekin
Proceedings of the 21st SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology
Data Anonymization for Privacy-Preserving Large Language Model Fine-Tuning on Call Transcripts
Shayna Gardiner | Tania Habib | Kevin Humphreys | Masha Azizi | Frederic Mailhot | Anne Paling | Preston Thomas | Nathan Zhang
Proceedings of the Workshop on Computational Approaches to Language Data Pseudonymization (CALD-pseudo 2024)
Shayna Gardiner | Tania Habib | Kevin Humphreys | Masha Azizi | Frederic Mailhot | Anne Paling | Preston Thomas | Nathan Zhang
Proceedings of the Workshop on Computational Approaches to Language Data Pseudonymization (CALD-pseudo 2024)
Large language models in public-facing industrial applications must accurately process data for the domain in which they are deployed, but they must not leak sensitive or confidential information when used. We present a process for anonymizing training data, a framework for quantitatively and qualitatively assessing the effectiveness of this process, and an assessment of the effectiveness of models fine-tuned on anonymized data in comparison with commercially available LLM APIs.
2023
Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology
Garrett Nicolai | Eleanor Chodroff | Frederic Mailhot | Çağrı Çöltekin
Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology
Garrett Nicolai | Eleanor Chodroff | Frederic Mailhot | Çağrı Çöltekin
Proceedings of the 20th SIGMORPHON workshop on Computational Research in Phonetics, Phonology, and Morphology
2021
Avengers, Ensemble! Benefits of ensembling in grapheme-to-phoneme prediction
Vagrant Gautam | Wang Yau Li | Zafarullah Mahmood | Fred Mailhot | Shreekantha Nadig | Riqiang Wang | Nathan Zhang
Proceedings of the 18th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology
Vagrant Gautam | Wang Yau Li | Zafarullah Mahmood | Fred Mailhot | Shreekantha Nadig | Riqiang Wang | Nathan Zhang
Proceedings of the 18th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology
We describe three baseline beating systems for the high-resource English-only sub-task of the SIGMORPHON 2021 Shared Task 1: a small ensemble that Dialpad’s speech recognition team uses internally, a well-known off-the-shelf model, and a larger ensemble model comprising these and others. We additionally discuss the challenges related to the provided data, along with the processing steps we took.
2019
Encoder-decoder models for latent phonological representations of words
Cassandra L. Jacobs | Frédéric Mailhot
Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology
Cassandra L. Jacobs | Frédéric Mailhot
Proceedings of the 16th Workshop on Computational Research in Phonetics, Phonology, and Morphology
We use sequence-to-sequence networks trained on sequential phonetic encoding tasks to construct compositional phonological representations of words. We show that the output of an encoder network can predict the phonetic durations of American English words better than a number of alternative forms. We also show that the model’s learned representations map onto existing measures of words’ phonological structure (phonological neighborhood density and phonotactic probability).