training data 0.0043457700000000005
test data 0.003695072
domain data 0.003617269
unlabeled data 0.003579591
such data 0.003515012
annotated data 0.003431074
labeled data 0.003372168
ing data 0.00337109
partial data 0.003335488
data selection 0.0033182080000000004
available data 0.0032942460000000002
data size 0.0032662940000000003
daily data 0.0032468090000000002
development data 0.0032441680000000004
input data 0.0032418300000000002
wikipedia data 0.003237043
free data 0.003219257
character word 0.0031904410000000005
unannotated data 0.003173111
mented data 0.003164913
lected data 0.003162186
data preparation 0.003161969
notated data 0.0031615980000000003
irrelevant data 0.0031585420000000003
proper data 0.0031585420000000003
data shift 0.0031585420000000003
data αyl 0.0031585420000000003
word segmentation 0.003147192
chinese word 0.0030000680000000003
same word 0.0028962330000000002
feature model 0.002860823
segmentation model 0.002829972
lexicon word 0.002618763
baseline model 0.002603557
word boundaries 0.0025044710000000003
nese word 0.0024972400000000004
word segmenta 0.0024858140000000003
word segmen 0.0024843850000000004
acter word 0.0024819060000000003
single model 0.0024082929999999997
crf model 0.0024044319999999998
joint model 0.002301865
model parameters 0.002262272
training sentences 0.002223699
training domain 0.002198959
domain training 0.002198959
unified model 0.002193914
model benefits 0.002184129
annotated training 0.002012764
different segmentation 0.00192582
training time 0.00191714
training process 0.001890831
model 0.00188423
standard training 0.001811016
different test 0.0017931099999999999
different words 0.001787183
training examples 0.001785868
chinese segmentation 0.00174436
segmentation performance 0.001709234
same character 0.001683774
same feature 0.001671376
segmentation problem 0.001592329
domain test 0.001548261
test domain 0.001548261
annotated feature 0.0015256269999999999
single character 0.001513054
test unlabeled 0.001510583
character sequence 0.001502805
test set 0.001474618
punctuation information 0.0014735949999999999
different methods 0.0014721639999999998
training 0.00146373
unlabeled sentences 0.00145752
different pos 0.001446803
english words 0.0014400440000000001
different source 0.001434164
previous character 0.001424305
lexicon features 0.0014100190000000002
segmentation label 0.001403084
second character 0.001400523
partial segmentation 0.00139919
lexicon feature 0.0013939059999999999
crf method 0.001371951
learning problem 0.001366355
annotated words 0.0013561390000000001
segmentation accuracy 0.00135159
news domain 0.0013504979999999999
input character 0.001348781
possible segmentation 0.001346698
different domains 0.001341538
segmentation ambiguity 0.001332038
statistical features 0.0013281450000000002
segmentation labels 0.001324507
different sources 0.0013188619999999999
different amounts 0.001318016
con features 0.001313692
annotated sentences 0.001309003
domain adaptation 0.00130706
segmentation task 0.0013014699999999999
con feature 0.001297579
