training data 0.003193484
data corpus 0.003073816
chinese word 0.0030675499999999996
word segmentation 0.00302204
unlabeled data 0.002980879
test data 0.002945182
word corpus 0.002838116
domain data 0.0028345510000000003
data sets 0.002634594
data table 0.002624765
labeled data 0.0026067
english word 0.002600481
pku data 0.002581085
data increases 0.00257459
raw data 0.002526225
notated data 0.0025149869999999998
data sparseness 0.002513334
data frommi 0.002512372
newswire data 0.002512372
baseline features 0.0024653970000000002
word boundary 0.0024109269999999998
natural word 0.002397729
statistical features 0.002370104
word clustering 0.002368862
context features 0.002343941
distribution features 0.002343121
word types 0.002332408
total word 0.002298175
punctuation features 0.002290263
text features 0.0022889
word segmenta 0.002287577
word identical 0.002286962
word segmen 0.0022813309999999997
giga word 0.00227978
word delimiters 0.002279383
new features 0.002255611
dynamic features 0.002244236
segmentation model 0.00224364
statistics features 0.002210544
continuous features 0.002194389
chinese character 0.00218608
discrete features 0.0021858
speciﬁc features 0.002137698
namic features 0.00213769
informative features 0.002135837
tical features 0.002135466
crf model 0.001972518
corpus character 0.0019566460000000003
feature values 0.0019394680000000002
baseline feature 0.001927087
character sequence 0.001887267
features 0.00187298
feature value 0.0018546510000000001
supervised model 0.001792237
diﬀerent feature 0.0017855380000000001
training corpus 0.00176356
feature templates 0.001749555
segmentation results 0.001691522
segmentation task 0.00167153
sequence label 0.001639587
chinese language 0.001615709
feature combinations 0.001601922
ﬁnal feature 0.001596222
feature engineer 0.001596222
feature conﬁguration 0.001596222
single character 0.00156216
unlabeled corpus 0.0015509550000000001
current character 0.001538142
regression model 0.00153485
other corpus 0.001525651
training algorithm 0.001524644
segmentation problem 0.001524027
based model 0.001520658
trained model 0.001516234
cws model 0.001515462
corpus method 0.001480273
last character 0.001466356
times character 0.001441757
character position 0.001429848
whole character 0.001425027
total character 0.0014167049999999999
character total 0.0014167049999999999
supervised approach 0.001410819
identical character 0.001405492
domain corpus 0.001404627
rent character 0.001398808
character reduplication 0.0013966669999999999
chinese lan 0.001365167
traditional chinese 0.001358634
chinese encyclopedia 0.001358231
label distribution 0.001357161
international chinese 0.00133533
feature 0.00133467
sequence labeling 0.0013247250000000001
segmentation prob 0.0013220570000000002
statistical information 0.001320029
training process 0.001318518
chinese giga 0.00131499
ing approach 0.0013142919999999999
ternational chinese 0.0013137589999999998
