language model 0.00540488
model size 0.004621897999999999
length model 0.004586514999999999
model problem 0.004533873
trigram model 0.004481601
traditional model 0.004433592
trigrams model 0.004417521
model sizes 0.004398946999999999
statistical model 0.004397992
average model 0.004392117
varigram model 0.004351015
model 0.00409207
last word 0.002560596
first word 0.002496233
word error 0.00247891
word problem 0.0024423830000000002
word version 0.002334694
unknown word 0.002288846
word tests 0.002280659
word tokens 0.002278825
unseen word 0.002269141
compound word 0.0022674030000000003
language models 0.002182613
other language 0.002060442
news words 0.0019712059999999997
length language 0.001807255
news corpus 0.001760866
wsj words 0.001751978
large corpus 0.001681995
language modeling 0.001632861
statistical language 0.001618732
compound words 0.001582783
language modelling 0.001571474
training data 0.0015556530000000002
corpus the 0.001489038
small corpus 0.001473693
corpus table 0.001471419
same probability 0.001440583
entire corpus 0.0014299260000000002
pruning algorithm 0.001419433
brown corpus 0.001398633
vodis corpus 0.001388179
nab corpus 0.001372968
trec corpus 0.0013704860000000002
board corpus 0.00136566
lob corpus 0.00136566
words 0.00131596
language 0.00131281
same time 0.001247147
traditional models 0.001211325
same size 0.001207318
same number 0.001181517
training text 0.001178109
conventional models 0.001163454
perplexity size 0.001159232
linguistic data 0.001144794
empirical data 0.001133496
sophisticated models 0.001128909
rate results 0.0011241369999999999
corpus 0.00110562
data consortium 0.001104422
high probability 0.001101976
large number 0.0010804019999999998
large corpora 0.0010256619999999999
misleading probability 0.0010232239999999999
test text 0.001021572
other places 0.001009457
other authors 0.001009457
poor results 0.001005808
method 9.94492E-4
evaluation test 9.94303E-4
american news 9.93929E-4
available training 9.78383E-4
business news 9.620399999999999E-4
fixed context 9.543830000000001E-4
mandarin news 9.43529E-4
english corpora 9.382329999999999E-4
perplexity calculations 9.3423E-4
chinese corpora 9.33948E-4
average perplexity 9.29451E-4
english wsj 9.249639999999999E-4
news compound 9.22069E-4
news service 9.16836E-4
news agency 9.16836E-4
reuters news 9.16836E-4
xinhua news 9.16836E-4
information channel 8.988760000000001E-4
perplexity reductions 8.930959999999999E-4
pruning 8.87031E-4
error rate 8.80944E-4
models 8.69803E-4
ideal information 8.639489999999999E-4
recognition performance 8.61886E-4
information theory 8.59697E-4
pos phrase 8.54121E-4
phrase frequencies 8.49518E-4
size source 8.47848E-4
distance classes 8.420210000000001E-4
large decrease 8.375079999999999E-4
large ranks 8.375079999999999E-4
