word segmentation 0.0042716
chinese word 0.00426901
character words 0.0038207600000000003
patent word 0.00370162
character features 0.00354288
ctb word 0.0035315389999999998
natural word 0.003456842
word delimiters 0.003439388
word formation 0.003430991
nese word 0.003404055
word segmen 0.003398963
word boundaries 0.00338287
meaningful word 0.003377116
established word 0.003371714
modern word 0.003371714
advanced word 0.003371714
character sequence 0.0030815540000000002
chinese words 0.00281177
first character 0.002787616
character tagging 0.0026902920000000004
candidate character 0.0026590710000000003
character strings 0.0026218310000000002
character type 0.0026112130000000002
next character 0.0025868920000000004
single character 0.0025519960000000004
character position 0.0025402000000000003
labeled character 0.0025289970000000003
target character 0.00251963
novel character 0.002518465
character bigrams 0.0024888820000000004
ture character 0.0024879520000000003
character positions 0.0024818830000000003
character unigrams 0.0024652230000000003
different feature 0.002464323
data data 0.00235274
new words 0.0022935380000000003
training data 0.0022744899999999997
character 0.00218363
segmentation model 0.0021638029999999997
many words 0.002113484
chinese characters 0.002082408
different training 0.002073273
feature values 0.002036632
new features 0.002015658
single words 0.0020054960000000003
feature value 0.001976796
everyday words 0.001973275
feature templates 0.001939842
compound words 0.0019170090000000001
pound words 0.0019152420000000002
test data 0.001889337
pos tag 0.001844452
feature sets 0.001836891
data set 0.00182542
pkl feature 0.001811294
segmentation accuracy 0.001799969
pmi feature 0.001790793
patent data 0.00178362
chinese patent 0.00178189
lng feature 0.001770996
segmentation problem 0.00176305
chinese language 0.001757651
training set 0.00174717
patent training 0.0017053699999999999
pos tags 0.001703037
chinese text 0.001689479
core features 0.001657864
effective features 0.001639882
words 0.00163713
segmentation methods 0.001628184
ctb data 0.001613539
data side 0.0015942259999999998
segmentation performance 0.001589844
segmentation systems 0.001576718
patent corpus 0.001537704
ctb training 0.001535289
segmentation tech 0.001503451
treebank data 0.0014993469999999998
tag set 0.001499039
chinese treebank 0.0014976170000000001
feature 0.00148917
segmentation standards 0.0014853520000000001
segmentation guidelines 0.001479252
data sources 0.0014663739999999999
crf model 0.0014625839999999998
entire data 0.00146122
segment data 0.001454716
data preparation 0.0014539989999999999
data split 0.0014539989999999999
chinese patents 0.001453868
chinese hanzi 0.0014523890000000001
training sets 0.001445841
labeling model 0.0014183109999999998
annotated training 0.001416482
different domain 0.0014007
mutual information 0.001381607
single pos 0.0013628289999999999
test set 0.001362017
features 0.00135925
sequence labeling 0.0013296620000000001
