language model 0.0040363199999999995
backoff model 0.00383483
distribution model 0.00383002
model pruning 0.003663363
bigram model 0.003393567
mixture model 0.003148483
common model 0.003049848
probabilistic model 0.003023271
model 0.00270266
word perplexity 0.002687395
language models 0.00250401
particular word 0.00249315
word pairs 0.002465456
preceding word 0.002440565
word pair 0.002431883
subunit word 0.002406889
word perplexitysize 0.002406889
backoff models 0.0023025199999999997
distribution models 0.0022977099999999997
models pruning 0.002131053
other words 0.002016145
training data 0.001920034
bigram models 0.0018612569999999998
backoff bigram 0.0018230770000000002
bigram distribution 0.0018182670000000001
data perplexity 0.0017817150000000001
different pruning 0.0017494350000000001
term distribution 0.001700241
previous words 0.0016886969999999999
statistical language 0.00168647
backoff estimate 0.0016575000000000001
bigram pruning 0.00165161
pruning method 0.001649724
language modelling 0.001632821
language understanding 0.0016201
distribution cutoff 0.001601849
cutoff distribution 0.001601849
probability estimate 0.0015728299999999999
testing data 0.001570976
backoff weights 0.001536744
data sparseness 0.001528048
pruning methods 0.001527177
poisson distribution 0.001504198
content words 0.001501644
different bigram 0.001479639
pruning parameters 0.001442904
high probability 0.001441419
backoff scheme 0.001435604
cutoff pruning 0.001435192
distribution modelling 0.001426521
log probability 0.001417499
distribution estimating 0.001414956
training corpus 0.001397961
clustering algorithm 0.001387311
generality probability 0.0013348280000000001
language 0.00133366
domain clustering 0.001313836
pruning criterion 0.001285911
large training 0.0012739639999999998
pruning scheme 0.001264137
large corpus 0.001261337
document cluster 0.001254625
training text 0.001246623
method domain 0.001246002
table results 0.0012176280000000001
words 0.00120299
conditional probabilities 0.001195934
domain cluster 0.001182153
same year 0.001170939
same month 0.001170939
models 0.00117035
cutoff method 0.00116351
mixture method 0.0011348439999999999
backoff 0.00113217
distribution 0.00112736
bigram estimates 0.0011112750000000001
experimental results 0.001101309
small count 0.001093537
general bigram 0.001079818
different writers 0.0010776219999999999
perplexity values 0.00106885
document clusters 0.001062544
count cutoff 0.001060385
new document 0.001059681
explicit bigram 0.001049604
document frequency 0.00104785
probability 0.0010475
cutoff methods 0.001040963
novel approach 0.001037306
chinese character 0.001037126
unseen bigram 0.001030986
based bigram 0.00100814
cutoff figure 0.001005924
computation method 0.0010028659999999998
particular document 0.001002183
similar domain 0.001000059
method style 9.97004E-4
daily training 9.9446E-4
typical training 9.91629E-4
information loss 9.903400000000001E-4
