tokenization model 0.002881158
first model 0.002857082
bilingual model 0.00266688
model section 0.0026604249999999997
second model 0.0026568629999999998
monolingual model 0.002636436
markov model 0.002611728
language training 0.002547771
alignment model 0.002543062
hybrid model 0.002536736
ment model 0.002532739
ibm model 0.0025301399999999997
hiddenmarkov model 0.002529404
model sec 0.002529404
english word 0.002469478
other language 0.002424834
language sentence 0.002300783
source language 0.002300754
model 0.0022803
target language 0.0021425439999999997
language pairs 0.002142065
target word 0.002121194
word pairs 0.002120715
foreign word 0.0020798830000000003
single word 0.002077177
individual word 0.002038177
language pair 0.0020341409999999997
complex word 0.002032681
gle word 0.002029776
word gen 0.0020286460000000003
ent language 0.0020242769999999997
language acquisition 0.0020090869999999997
word boundaries 0.002005948
word alignment 0.002001512
word alignments 0.001995517
hungarian word 0.0019881580000000003
word beginning 0.0019881580000000003
english translation 0.001857288
language 0.0017601
translation system 0.0017512819999999998
training data 0.0016948599999999999
machine translation 0.0016576199999999998
english data 0.001637917
translation systems 0.001567284
new translation 0.001508863
translation pairs 0.001508525
tokenization models 0.001502091
chinese sentences 0.001471187
translation parameters 0.0014675579999999999
lexical translation 0.001448911
source data 0.0014478429999999999
translation com 0.001426156
english string 0.001418585
chinese segmentation 0.001417723
translation quality 0.0014042009999999999
test data 0.0014037379999999999
chine translation 0.001379265
chinese character 0.0013487199999999999
forward probability 0.001328273
bayesian models 0.001298644
bilingual data 0.001293769
foreign words 0.001291441
new data 0.001289492
parallel data 0.00127542
different methods 0.00126446
monolingual data 0.001263325
monolingual models 0.001257369
ing data 0.001255737
probability changes 0.00123964
markov models 0.001232661
separate words 0.0012318
different length 0.001226144
same length 0.001222355
source corpus 0.001214613
treebank data 0.001208453
distinct words 0.001207306
emission probability 0.001202629
tokenization tokenization 0.001201716
posterior probability 0.001200804
transition probability 0.001199142
newswire data 0.001198013
stanford chinese 0.001190773
chinese treebank 0.001182246
learning tokenization 0.0011716460000000001
chinese segmenter 0.001166011
alignment models 0.001163995
different experiments 0.0011601720000000001
chinese writing 0.001159663
same number 0.0011586399999999998
character string 0.0011555950000000001
tokenization results 0.0011521230000000001
chinese segmentations 0.0011488779999999999
chinese tokenizer 0.001142482
chinese tokeniz 0.001130378
translation 0.00112656
tokenization methods 0.001108474
results results 0.00110253
test set 0.0011009280000000001
source sentence 0.001081337
different problems 0.001081193
