word segmentation 0.004539909999999999
chinese word 0.00402679
word boundary 0.003574501
unknown word 0.0034087609999999997
word segmenta 0.003283627
word boundaries 0.003283279
word identification 0.003240164
nese word 0.003238523
monosyllabic word 0.00323642
word resolu 0.003234001
standard model 0.002599052
chinese segmentation 0.0025637200000000002
baseline model 0.002412114
classification model 0.0024052509999999997
robust model 0.002367371
tokenization model 0.00236654
window model 0.002365323
processing model 0.0023482719999999998
based model 0.0023431859999999997
radical model 0.0023342849999999997
classical model 0.0023227969999999997
character boundary 0.002312111
character string 0.002160333
character strings 0.0021378589999999998
ing character 0.002111853
model 0.00209006
character classification 0.0020542909999999998
character classes 0.002046399
character vector 0.0020448239999999998
ground character 0.002010795
particular character 0.0020066199999999998
text segmentation 0.001984898
character yields 0.0019735069999999998
plain character 0.001972246
character imme 0.001972246
other words 0.001930396
training data 0.001875689
segmentation task 0.001872852
modeling segmentation 0.001859242
chinese characters 0.001852655
segmentation bakeoff 0.001835216
robust segmentation 0.001815731
character 0.0017391
chinese language 0.001701955
training corpus 0.001662686
different characters 0.001643646
foreign words 0.0016151520000000001
segment words 0.0015889620000000002
oov words 0.001575253
monosyllabic words 0.00154798
segmentation 0.00153842
statistical data 0.001483689
chinese text 0.001471778
lexical information 0.001376155
chinese texts 0.001346935
words 0.00131305
sighan chinese 0.001299959
data collection 0.0012744750000000002
data sparseness 0.001269353
unavoidable data 0.0012601040000000002
prior training 0.001244686
small corpus 0.0012169899999999998
testing corpus 0.0011441099999999998
segmented corpus 0.001123508
ter training 0.001120132
vector corpus 0.001119621
sinica corpus 0.0011171619999999999
training dataset 0.001110458
extensive training 0.00108958
massive training 0.001081727
neighboring characters 0.001063357
different informa 0.0010568630000000001
test corpora 0.001040945
chinese 0.0010253
probability values 0.0010138360000000002
several models 9.99082E-4
lexical knowledge 9.98114E-4
mentation method 9.97717E-4
new types 9.93734E-4
current models 9.50226E-4
ical information 9.46722E-4
classification approach 9.45368E-4
occurrence probability 9.39486E-4
distributional information 9.375449999999999E-4
collection models 9.26991E-4
actual distribution 9.20041E-4
language use 9.11966E-4
lexical knowl 9.10268E-4
lexical database 9.087510000000001E-4
lexical databases 9.087510000000001E-4
tokenization approach 9.06657E-4
tion algorithm 8.74402E-4
such ambiguities 8.58946E-4
radical method 8.495130000000001E-4
training 8.48789E-4
tree classifier 8.444419999999999E-4
large number 8.344240000000001E-4
characters 8.27355E-4
boundary frequencies 8.17742E-4
corpus 8.13897E-4
