word model 0.00638613
different word 0.0055704759999999996
word sequence 0.005383229
word problem 0.005330962
unknown word 0.005302895
word clustering 0.005292879
word string 0.005269417
word length 0.005257844
word classes 0.005256013
target word 0.005248404
initial word 0.0052010319999999995
maximum word 0.005190633
word clusters 0.005184617
word cluster 0.005183358
word boundaries 0.005177285
structured word 0.005167392
complete word 0.005133158
vious word 0.005124249
likely word 0.005121702
word bits 0.005120269
relevant word 0.005115286
word segmenta 0.005109689
word tokenizer 0.005107842
same character 0.002507127
character set 0.002370508
character sequence 0.002353199
following character 0.002282406
character clustering 0.002262849
character classes 0.002225983
partial character 0.002213237
ing character 0.002164536
character alphabet 0.002157131
character clusters 0.002154587
character cluster 0.002153328
structured character 0.002137362
character sorts 0.002095042
character clus 0.00208644
character sort 0.002081979
identifiable character 0.00207847
character clusterings 0.00207847
pute character 0.00207847
language model 0.002008931
many words 0.0019520789999999998
tagging model 0.001908597
unknown words 0.001856525
output words 0.001851066
character 0.00182789
guage model 0.0017840059999999999
real words 0.001710268
loan words 0.0016764509999999998
model 0.00152821
training data 0.001507019
different characters 0.001492116
different feature 0.001467394
chinese characters 0.001440706
training set 0.00141654
words 0.00141155
training sentences 0.0013656789999999999
possible characters 0.0013535650000000001
such knowledge 0.001256441
training sets 0.00123572
clustering characters 0.001214519
different tag 0.001181475
new class 0.00116174
other languages 0.001155232
conversation training 0.001151486
edr corpus 0.001133563
sation corpus 0.001125402
conversational corpus 0.001125402
probability distribution 0.001106959
joint probability 0.001104515
other contexts 0.0010819710000000001
test set 0.001080686
different sets 0.001074354
such groupings 0.001072672
japanese text 0.001057716
other advantages 0.001045347
other sorts 0.001043863
other parameters 0.001042802
supposing characters 0.001031244
ual characters 0.001031244
feature levels 0.0010311069999999999
test sentences 0.001029825
other groups 0.0010287640000000002
other symbols 0.0010287640000000002
same domain 0.0010272789999999999
different cost 0.00101513
tag candidate 0.001013458
tag set 0.001011537
japanese language 0.001009649
different labels 9.99731E-4
tag sequence 9.94228E-4
new items 9.86337E-4
new item 9.8357E-4
data size 9.78284E-4
input sentence 9.768070000000001E-4
different corpora 9.72889E-4
different purposes 9.63949E-4
possible questions 9.56049E-4
cluster information 9.54901E-4
