language model 0.00348294
large word 0.003148047
model training 0.00295824
word tokens 0.002883244
word position 0.002878269
word vocabulary 0.002867261
frequent word 0.002850197
automatic word 0.002826452
move word 0.002822638
word move 0.002822638
class model 0.00281225
mixed word 0.002806615
last word 0.002804169
word classifi 0.002787798
word vocabu 0.002775567
word positions 0.002773865
word clas 0.002771474
word classifications 0.002770193
model size 0.002641961
linear model 0.002605737
bigram model 0.002494036
gramming model 0.002434272
trigram model 0.00243198
model approxi 0.002427149
training data 0.0022517500000000003
test data 0.002236328
model 0.00219375
language models 0.002187998
translation test 0.002047698
data set 0.00194714
english data 0.001932735
target data 0.001911769
parallel data 0.0018737670000000001
news data 0.001865935
web data 0.001862493
various data 0.001833701
data sets 0.001830773
machine translation 0.00176727
delta data 0.001763485
data structures 0.001757197
complete data 0.0017390980000000001
data spar 0.001738668
data sparsity 0.001737623
fourth data 0.001723336
target language 0.001713699
translation results 0.001673978
translation task 0.001672441
statistical language 0.001671208
source language 0.001659278
translation problem 0.001648811
translation system 0.001645226
translation experiment 0.001631187
language modeling 0.001615587
translation tasks 0.001589633
frequent words 0.001586597
distinct words 0.001562516
language mod 0.001558044
translation sys 0.0015566030000000002
chine translation 0.001537422
gram language 0.001526522
class models 0.001517308
preceding words 0.001517263
ilar words 0.0015121940000000001
lected words 0.001506529
different training 0.00150281
large training 0.001377177
other training 0.0013504810000000002
clustering algorithm 0.001350168
translation 0.00129863
language 0.00128919
training corpus 0.001289102
words 0.00127176
previous clustering 0.0012662939999999998
parallel training 0.001150997
initial clustering 0.001148419
guage models 0.0011460139999999999
same corpus 0.001144559
current clustering 0.001135812
exchange clustering 0.001134574
training corpora 0.0011303630000000001
rate training 0.001119788
rent clustering 0.001099984
related clustering 0.0010952779999999999
tial clustering 0.0010935229999999999
specific clustering 0.001084168
tributed clustering 0.001076179
complete clustering 0.001074115
clustering algorithms 0.001071308
clustering algo 0.001066744
test score 0.0010661899999999998
clustering tech 0.001061002
large size 0.001060898
agglomerative clustering 0.001056498
smoothing method 0.001047677
training cor 0.001046629
different combinations 0.001023936
different types 0.001009359
dev test 9.90536E-4
blind test 9.82602E-4
large corpora 9.7856E-4
