The objective of this work is to disambiguate transducers which have the following form: T = R o D and to be able to apply the determinization algorithm described in (Mohri, 1997).
Our approach to disambiguating T = R o D consists first of computing the composition T and thereafter to disambiguate the transducer T. We will give an important consequence of this result that allows us to compose any number of transducers R with the transducer D, in contrast to the previous approach which consisted in first disambiguating transducers D and R to produce respectively D and R , then computing T = R o D where T is unambiguous.
We will present results in the case of a transducer D representing a dictionary and R representing phonological rules.
Keywords: ambiguity, deterministic, dictionary, transducer.
1 Introduction
The task of speech recognition can be decomposed into several steps, where each step is represented by a finite-state transducer (Mohri et al., 1998).
The search space of the recognizer is defined by the composition of transducers T = A o C o R o D o M. Transducer A converts a sequence of observations O to a sequence of context-
dependent phones.
Transducer C converts a sequence of context-dependent phones to a sequence of context-independent phones.
Transducer R is a mapping from phones to phones which implements phonological rules.
Transducer D is the pronunciations dictionary.
It converts a sequence of context-independent phones to a sequence of words.
Transducer M represents a language model: it converts sequences of words into sequences of words, while restricting the possible sequences or assigning a score to the sequences.
The speech recognition problem consists of finding the path of least cost in transducer O o T, where O is a sequence of acoustic observations.
The pronunciations dictionary representing the mapping from pronunciations to words can show an inherent ambiguity: a sequence of phones can correspond to more than one word, so we cannot apply the transducer de-terminization algorithm (an operation which reduces the redundancy, search time and possibly space).
This problem is usually handled by adding special symbols to the dictionary to remove the ambiguity in order to be able to apply the determinization algorithm (Koskenniemi, 1990).
Nevertheless, when we compose the dictionary with the phonological rules, we
must take into account special symbols.
This complicates the construction of transducers representing these rules and leads to size explosion.
It would be simpler to compose the rules with the dictionary, then remove the ambiguity in the result and then apply the determinization algorithm.
2 Notations and
definitions
Formally, a weighted transducer over a semiring K = (K, ©, ®, 0,1) is defined as a 6-tuple T = (Q, I, Si, E2,E, F) where Q is a finite set of states, I C Q is a finite set of initial states, S1 is the input alphabet, E2 is the output alphabet, E is a finite set of transitions and F C Q is a finite set of final states.
A transition is an element of Q x E1 x S2 x Q x K.
Transitions are of the form
where p(t) denotes the transition's origin state, i(t) its input label, o(t) its output label, n(t) the transition's destination state and w(t) G K is the weight of t. The tropical semiring defined as (R+ U to, min, +, to, 0) is commonly used in speech recognition, but our results are applicable to the case of general semirings as well.
n(ti-1) = for 2 < i < n.
We can easily extend the functions p and n to those paths:
We denote by P (r, s) the set of paths whose origin is state r and whose destination is state s. We can also extend
We can extend the functions i and o to the paths by taking the concatenations of the input and output symbols:
Definition 1 (unambiguous transducer, (Berstel, 1979)) A transducer T is said to be unambiguous if for each w G S1, there exists at most one path n in T such that = w.
Definition 2 (ambiguous paths) Two paths n and a are ambiguous if n = a and = i(a).
Remark 1 : To remove the ambiguity between two paths n and a, it suffices to modify by changing the first input label of the path n. This is done by introducing an auxiliary symbol such that: = i(a).
Figure 1a shows an ambiguous transducer.
It is ambiguous since for the input string "s e [z]", there are two paths representing the output strings {ces, ses}.
In this figure, "eps" stands for epsilon or null symbol.
To disambiguate a transducer, we first group the ambiguous paths; we then remove the ambiguity in each group by adding auxiliary labels as shown in Figure 1b.
Unfortunately, it is infeasible to enumerate all the paths in a cyclic transducer.
However, in (Smaili, 2001) it is shown that cyclic transducers of the type studied in this work can be disambiguated by transforming to a corresponding acyclic sub-transducer such that T C T. This
Figure 1: (a) Ambiguous transducer (b) Disambiguated transducer
fundamental property is described in detail in section 2.1.
Accordingly, we apply the appropriate transformation to the input transducer.
2.1 Fundamental Property
Any cycle in T contains at least a transition t such that i(t) g S1.
that E = E0 w E1.
We can give a characterization of the ambiguous paths verifying the fundamental property.
Before, let's make the following remark:
with n g E+, f g E+ for 1 < i < n, /0 g Ej* and n0 g Eq if n > 1.
If n = 0 then n = /0 n0.
Proposition 1 (characterization of ambiguous paths)
ai and ni are ambiguous (0 < i < n). fj and gi are ambiguous (0 < i < n).
We will assume that the first transition's path belongs to E0, i.e. f0 = e. Recall that if we want to avoid cycles, we just have to remove from T all transitions t g Ei.
According to Proposition 1, ambiguity needs to be removed only in paths that use transitions t g E0, namely the path ni that performs the decomposition given in Remark 2.
Disambiguation consists only of introducing auxiliary labels in the ambiguous paths.
We denote by Asrc the set of origin states of transitions belonging to Ei and by Adst the set of destination states of transitions belonging to E2.
According to Proposition 1 and what precedes, it would be equivalent and simpler to disambiguate an acyclic transducer obtained from T in which we have removed all Ei transitions.
Therefore, we introduce the operator * : {Tin} —> {Tout} which accomplishes this construction.
Ii = I u Adst u{i}, with i g Q.
Fi = F u Asrc u{/}, with / g Q.
ET = E\Ei u{(i, q, e, e, 0), q g Ii} u{(q, /, e, e, 0), q g Fi}.
The third condition insures the connectivity of \P(T) if T is itself connected.
It suffices to disambiguate the acyclic transducer \P(T), then reinsert the transitions of E1 in ^(T).
The set of paths in *(T) is then P(i1, Ft).
T = (Q, i, X, Y, E, F) is an ambiguous transducer verifying the fundamental property.
T1 = (Q, i, X U X1, Y, ET, F) is an unambiguous transducer, X1 is the set of auxiliary symbols.
Tacyclic — ).
Path — set of paths of Tacyclic.
Disambiguate the set Path (creating the set X1 ).
T0 — build the unambiguous transducer which has unambiguous paths.
T1 — \P-1(T0) (consists of reinserting in T0 the transitions of T which where removed).
Now, we will study an important class of transducers verifying the fundamental property.
This class is obtained by doing the composition of a transducer D verifying the fundamental property with a transducer R. The composition of two transducers is an efficient algebraic operation for building more complex transducers.
We give a brief definition of composition and the fundamental theorem that insures the invariance of the fundamental property by composition.
3 Composition
The transducer T created by the composition of two transducers R and D, denoted T = R o D, performs the mapping of word x to word z if and only if R maps x to y and D maps y to z. The weight of the resulting word is the 0-product of the weights of y and z (Pereira and Riley, 1997).
Note that, in order to make the composition possible, we must have o(t) = i(e).
Definition 4 (Composition)
E = {eRoes : eR G Er, es G Es}.
Let D = (Qd ,Id ,Y,Z,Ed , Fd) be a transducer verifying the fundamental property.
We can write Y = Y0 W Yi where Y0 = {i(t) : t G E0} and Yi = {i(t) : t G Ei}.
Theorem 1 (Fundamental) Let
(C) Vt G Er, o(t) G Yi ^ i(t) G Yi.
Then the transducer T = R o D verifies the fundamental property.
n = ffR o nD = (/1 o g1) • • • (/n o gn).
S o (R o D) = (S o R) o D.
TTO = RTO o RTO-1 • • • R1 o D.
To this end, we proceed as follows: we add the auxiliary symbols to disam-biguate the transducer; then we apply
determinization and finally we remove the auxiliary labels.
These three operations are denoted by -0.
= r 0(D) if i = 0.
i \0(Ri o 0(Ti-1 ))if i > 1.
The size of transducer Tm can also be reduced by computing:
Tm = 0(Rm o Rm-1 • • • R1 o D).
The old approach:
Tm = Rm o Rm-1 ^ ^ ^ R1 o D .
has several disadvantages.
The size of Ri for 1 < i < m increases considerably since the auxiliary labels introduced in each transducer have to be taken into account in all others.
This fact limits the number of transducers that can be composed with D.
4 Application and Results
We will now apply our algorithm to transducers involved in speech recognition.
Transducer D represents the pronunciation dictionary and possesses the fundamental property.
The set of transitions of D is defined as
where / is the unique final state of D, 0 is the unique initial state of D, x is any symbol and # is a symbol representing the end of a word.
All transitions t G E0 are such that i(t) = #.
Any path n in is acyclic.
The transducer R representing a phonological rule is constructed to fulfill condition (C) of the fundamental theorem.
The transducer D represents a French dictionary with 20000 words and their pronunciations.
The transducer R represents the phonological rule that handles liaison in the French language.
This liaison,
which is represented by a phoneme appearing at the end of some words, must be removed when the next word begins with a consonant since the liaison phoneme is never pronounced in that case.
However, if the next word begins with a vowel, the liaison phoneme may or may not be pronounced and thus becomes optional.
Figure 2: Transducer used to handle the optional liaison rule.
Figure 2 shows the transducer that handles this rule.
In the figure, p denotes all phonemes, v the vowels and [x] the liaison phonemes.
Table 1 shows the results of our algorithm using the dictionary and the phonological rule previously described.
Transducer
Transitions
Table 1: Size reduction on a French dictionary
As we can see in Table 1, the operator 0 produces a smaller transducer in all the cases considered here.
5 Conclusion and future work
We have been able to disambiguate an important class of cyclic and ambiguous transducers, which allows us
to apply the determinization algorithm (Mohri, 1997); and then to reduce the size of those transducers.
With our new approach, we do not have to take into account the number of transducers Ri and their auxiliary labels as was the case with the approach used before.
Thus, new transducers Ri such as phonological rules can be easily inserted in the chain.
The major disadvantage of our approach is that disambiguating a transducer increases its size systematically.
Our future work will consist of developing a more effective algorithm for dis-ambiguating an acyclic transducer.
