Anej Svete


2023

pdf
On the Representational Capacity of Recurrent Neural Language Models
Franz Nowak | Anej Svete | Li Du | Ryan Cotterell
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

This work investigates the computational expressivity of language models (LMs) based on recurrent neural networks (RNNs). Siegelmann and Sontag (1992) famously showed that RNNs with rational weights and hidden states and unbounded computation time are Turing complete. However, LMs define weightings over strings in addition to just (unweighted) language membership and the analysis of the computational power of RNN LMs (RLMs) should reflect this. We extend the Turing completeness result to the probabilistic case, showing how a rationally weighted RLM with unbounded computation time can simulate any deterministic probabilistic Turing machine (PTM) with rationally weighted transitions. Since, in practice, RLMs work in real-time, processing a symbol at every time step, we treat the above result as an upper bound on the expressivity of RLMs. We also provide a lower bound by showing that under the restriction to real-time computation, such models can simulate deterministic real-time rational PTMs.

pdf
Recurrent Neural Language Models as Probabilistic Finite-state Automata
Anej Svete | Ryan Cotterell
Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing

Studying language models (LMs) in terms of well-understood formalisms allows us to precisely characterize their abilities and limitations. Previous work has investigated the expressive power of recurrent neural network (RNN) LMs in terms of their capacity to recognize unweighted formal languages. However, LMs do not describe unweighted formal languages—rather, they define probability distributions over strings. In this work, we study what classes of such probability distributions RNN LMs can represent, which allows us to make more direct statements about their capabilities. We show that simple RNNs are equivalent to a subclass of probabilistic finite-state automata, and can thus model a strict subset of probability distributions expressible by finite-state models. Furthermore, we study the space complexity of representing finite-state LMs with RNNs. We show that, to represent an arbitrary deterministic finite-state LM with N states over an alphabet 𝛴, an RNN requires 𝛺\left(N |𝛴|\right) neurons. These results present a first step towards characterizing the classes of distributions RNN LMs can represent and thus help us understand their capabilities and limitations.

2022

pdf
Algorithms for Acyclic Weighted Finite-State Automata with Failure Arcs
Anej Svete | Benjamin Dayan | Ryan Cotterell | Tim Vieira | Jason Eisner
Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing

Weighted finite-state automata (WSFAs) arecommonly used in NLP. Failure transitions area useful extension for compactly representingbackoffs or interpolation in n-gram modelsand CRFs, which are special cases of WFSAs.Unfortunately, applying standard algorithmsfor computing the pathsum requires expand-ing these compact failure transitions. As aresult, na ̈ıve computation of the pathsum inacyclic WFSAs with failure transitions runs inO(|Q|2|Σ|) (O(|Q||Σ|) for deterministic WF-SAs) while the equivalent algorithm in normalWFSAs runs in O(|E|), where E representsthe set of transitions, Q the set of states, andΣ the alphabet. In this work, we present moreefficient algorithms for computing the pathsumin sparse acyclic WFSAs, i.e., WFSAs with av-erage out symbol fraction s ≪ 1. In those,backward runs in O(s|Q||Σ|). We proposean algorithm for semiring-weighted automatawhich runs in O(|E| + s|Σ||Q||Tmax| log |Σ|),where |Tmax| is the size of the largest con-nected component of failure transitions. Ad-ditionally, we propose faster algorithms fortwo specific cases. For ring-weighted WF-SAs we propose an algorithm with complex-ity O(|E| + s|Σ||Q||πmax|), where |πmax| de-notes the longest path length of failure transi-tions stemming from q and Σ(q) the set of sym-bols on the outgoing transitions from q. Forsemiring-weighted WFSAs whose failure tran-sition topology satisfies a condition exemplifiedby CRFs, we propose an algorithm with com-plexity O(|E| + s|Σ||Q| log |Σ|).