word segmentation 0.00490279
word distribution 0.00455312
chinese word 0.0043818929999999996
word information 0.004370204
same word 0.004340828
word unigram 0.004196173
word boundary 0.004167919
previous word 0.004111895
prior word 0.00409607
single word 0.004092577
word type 0.004089495
possible word 0.004081517
important word 0.004041253
word distributions 0.004039712
word boundaries 0.004039179
word sequences 0.003988435
word seg 0.00396836
current word 0.003960918
word segmen 0.003935974
word distribu 0.003922078
word segmenta 0.00391804
word distri 0.003913603
marked word 0.003911878
segmentation model 0.0037137100000000003
language model 0.003343599
bigram model 0.003124285
model inference 0.002859533
hdp model 0.002823373
guage model 0.0027411090000000003
model description 0.002727098
model goldwater 0.002723887
chinese words 0.002672023
other words 0.002652267
model 0.00245407
new words 0.0024480269999999998
segmented words 0.002344088
oov words 0.002248692
neighboring words 0.002230753
words 0.00193328
chinese character 0.001823833
domain corpus 0.001823117
segmentation results 0.001775518
new corpus 0.0016711669999999999
evaluation corpus 0.001634092
training data 0.0016293829999999999
chinese language 0.001628272
segmentation task 0.001588739
bigram distribution 0.001580185
current segmentation 0.001577408
proper segmentation 0.001575788
bigram language 0.001559744
further segmentation 0.001549719
character string 0.001545463
available segmentation 0.001539921
segmentation tools 0.001530034
corpus level 0.001526931
whole corpus 0.001479188
pku corpus 0.001471497
reference corpus 0.001430845
same domain 0.001364375
prior distribution 0.0013628899999999998
segmentation 0.00125964
bayesian models 0.001253639
chinese characters 0.0012452650000000002
language processing 0.001227419
natural language 0.001217463
different documents 0.0012169
other methods 0.001192618
different domains 0.001177457
mentation models 0.001169898
corpus 0.00115642
different docu 0.001152809
hdp method 0.001146634
sampling process 0.001119752
similar bigram 0.001113157
chinese terms 0.001099266
data interface 0.001093807
character 0.00108509
same time 0.0010803219999999999
bigram distributions 0.001066777
standard results 0.0010644930000000001
other approaches 0.001061335
other documents 0.001054849
method sig 0.001052082
other sentences 0.00105153
general domain 0.001050089
chinese restaurant 0.001050088
mutual information 0.001046754
same floor 0.001040482
gibbs sampling 0.001039302
domain level 0.001037208
chinese sen 0.0010266490000000001
standard evaluation 0.001026287
specific domain 0.001008655
leverage information 0.001003132
information col 9.988269999999999E-4
other nonpara 9.89768E-4
same kind 9.82969E-4
segmented text 9.80478E-4
dirichlet process 9.7735E-4
