Michael Riley

Also published as: Michael D. Riley


2021

pdf bib
Approximating Probabilistic Models as Weighted Finite Automata
Ananda Theertha Suresh | Brian Roark | Michael Riley | Vlad Schogol
Computational Linguistics, Volume 47, Issue 2 - June 2021

Weighted finite automata (WFAs) are often used to represent probabilistic models, such as n-gram language models, because among other things, they are efficient for recognition tasks in time and space. The probabilistic source to be represented as a WFA, however, may come in many forms. Given a generic probabilistic model over sequences, we propose an algorithm to approximate it as a WFA such that the Kullback-Leibler divergence between the source model and the WFA target model is minimized. The proposed algorithm involves a counting step and a difference of convex optimization step, both of which can be performed efficiently. We demonstrate the usefulness of our approach on various tasks, including distilling n-gram models from neural models, building compact language models, and building open-vocabulary character models. The algorithms used for these experiments are available in an open-source software library.

2019

pdf
Federated Learning of N-Gram Language Models
Mingqing Chen | Ananda Theertha Suresh | Rajiv Mathews | Adeline Wong | Cyril Allauzen | Françoise Beaufays | Michael Riley
Proceedings of the 23rd Conference on Computational Natural Language Learning (CoNLL)

We propose algorithms to train production-quality n-gram language models using federated learning. Federated learning is a distributed computation platform that can be used to train global models for portable devices such as smart phones. Federated learning is especially relevant for applications handling privacy-sensitive data, such as virtual keyboards, because training is performed without the users’ data ever leaving their devices. While the principles of federated learning are fairly generic, its methodology assumes that the underlying models are neural networks. However, virtual keyboards are typically powered by n-gram language models for latency reasons. We propose to train a recurrent neural network language model using the decentralized FederatedAveraging algorithm and to approximate this federated model server-side with an n-gram model that can be deployed to devices for fast inference. Our technical contributions include ways of handling large vocabularies, algorithms to correct capitalization errors in user data, and efficient finite state transducer algorithms to convert word language models to word-piece language models and vice versa. The n-gram language models trained with federated learning are compared to n-grams trained with traditional server-based algorithms using A/B tests on tens of millions of users of a virtual keyboard. Results are presented for two languages, American English and Brazilian Portuguese. This work demonstrates that high-quality n-gram language models can be trained directly on client mobile devices without sensitive training data ever leaving the devices.

pdf
On the Compression of Lexicon Transducers
Marco Cognetta | Cyril Allauzen | Michael Riley
Proceedings of the 14th International Conference on Finite-State Methods and Natural Language Processing

In finite-state language processing pipelines, a lexicon is often a key component. It needs to be comprehensive to ensure accuracy, reducing out-of-vocabulary misses. However, in memory-constrained environments (e.g., mobile phones), the size of the component automata must be kept small. Indeed, a delicate balance between comprehensiveness, speed, and memory must be struck to conform to device requirements while providing a good user experience.In this paper, we describe a compression scheme for lexicons when represented as finite-state transducers. We efficiently encode the graph of the transducer while storing transition labels separately. The graph encoding scheme is based on the LOUDS (Level Order Unary Degree Sequence) tree representation, which has constant time tree traversal for queries while being information-theoretically optimal in space. We find that our encoding is near the theoretical lower bound for such graphs and substantially outperforms more traditional representations in space while remaining competitive in latency benchmarks.

pdf
Distilling weighted finite automata from arbitrary probabilistic models
Ananda Theertha Suresh | Brian Roark | Michael Riley | Vlad Schogol
Proceedings of the 14th International Conference on Finite-State Methods and Natural Language Processing

Weighted finite automata (WFA) are often used to represent probabilistic models, such as n-gram language models, since they are efficient for recognition tasks in time and space. The probabilistic source to be represented as a WFA, however, may come in many forms. Given a generic probabilistic model over sequences, we propose an algorithm to approximate it as a weighted finite automaton such that the Kullback-Leibler divergence between the source model and the WFA target model is minimized. The proposed algorithm involves a counting step and a difference of convex optimization, both of which can be performed efficiently. We demonstrate the usefulness of our approach on some tasks including distilling n-gram models from neural models.

pdf
Latin script keyboards for South Asian languages with finite-state normalization
Lawrence Wolf-Sonkin | Vlad Schogol | Brian Roark | Michael Riley
Proceedings of the 14th International Conference on Finite-State Methods and Natural Language Processing

The use of the Latin script for text entry of South Asian languages is common, even though there is no standard orthography for these languages in the script. We explore several compact finite-state architectures that permit variable spellings of words during mobile text entry. We find that approaches making use of transliteration transducers provide large accuracy improvements over baselines, but that simpler approaches involving a compact representation of many attested alternatives yields much of the accuracy gain. This is particularly important when operating under constraints on model size (e.g., on inexpensive mobile devices with limited storage and memory for keyboard models), and on speed of inference, since people typing on mobile keyboards expect no perceptual delay in keyboard responsiveness.

2017

pdf bib
Transliterated Mobile Keyboard Input via Weighted Finite-State Transducers
Lars Hellsten | Brian Roark | Prasoon Goyal | Cyril Allauzen | Françoise Beaufays | Tom Ouyang | Michael Riley | David Rybach
Proceedings of the 13th International Conference on Finite State Methods and Natural Language Processing (FSMNLP 2017)

2016

pdf
Distributed representation and estimation of WFST-based n-gram models
Cyril Allauzen | Michael Riley | Brian Roark
Proceedings of the SIGFSM Workshop on Statistical NLP and Weighted Automata

2014

pdf
Pushdown Automata in Statistical Machine Translation
Cyril Allauzen | Bill Byrne | Adrià de Gispert | Gonzalo Iglesias | Michael Riley
Computational Linguistics, Volume 40, Issue 3 - September 2014

2013

pdf
Smoothed marginal distribution constraints for language modeling
Brian Roark | Cyril Allauzen | Michael Riley
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

2012

pdf
The OpenGrm open-source finite-state grammar software libraries
Brian Roark | Richard Sproat | Cyril Allauzen | Michael Riley | Jeffrey Sorensen | Terry Tai
Proceedings of the ACL 2012 System Demonstrations

2011

pdf
Hierarchical Phrase-based Translation Representations
Gonzalo Iglesias | Cyril Allauzen | William Byrne | Adrià de Gispert | Michael Riley
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

pdf
Expected Sequence Similarity Maximization
Cyril Allauzen | Shankar Kumar | Wolfgang Macherey | Mehryar Mohri | Michael Riley
Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics

2009

pdf
OpenFst: An Open-Source, Weighted Finite-State Transducer Library and its Applications to Speech and Language
Michael Riley | Cyril Allauzen | Martin Jansche
Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Tutorial Abstracts

2004

pdf
Statistical Modeling for Unit Selection in Speech Synthesis
Mehryar Mohri | Cyril Allauzen | Michael Riley
Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics (ACL-04)

2003

pdf
Word N-Grams for Cluster Keyboards
Nils Klarlund | Michael Riley
Proceedings of the 2003 EACL Workshop on Language Modeling for Text Entry Methods

1996

pdf
Compilation of Weighted Finite-State Transducers from Decision Trees
Richard Sproat | Michael Riley
34th Annual Meeting of the Association for Computational Linguistics

1994

pdf
The Hub and Spoke Paradigm for CSR Evaluation
Francis Kubala | Jerome Bellegarda | Jordan Cohen | David Pallett | Doug Paul | Mike Phillips | Raja Rajasekaran | Fred Richardson | Michael Riley | Roni Rosenfeld | Bob Roth | Mitch Weintraub
Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994

pdf
Weighted Rational Transductions and their Application to Human Language Processing
Fernando Pereira | Michael Riley | Richard Sproat
Human Language Technology: Proceedings of a Workshop held at Plainsboro, New Jersey, March 8-11, 1994

1991

pdf
Lexical Access With a Statistically-Derived Phonetic Network
Michael D. Riley | Andrej Ljolje
Speech and Natural Language: Proceedings of a Workshop Held at Pacific Grove, California, February 19-22, 1991

1989

pdf
Some Applications of Tree-based Modelling to Speech and Language
Michael D. Riley
Speech and Natural Language: Proceedings of a Workshop Held at Cape Cod, Massachusetts, October 15-18, 1989