word features 0.0036071899999999997
word clustering 0.00344636
different word 0.00320802
word clusters 0.003029769
word context 0.002841303
cluster features 0.0026838
distributional word 0.002607515
single word 0.002521144
word types 0.0024819959999999998
word whistler 0.002477391
unigram word 0.002436667
hyphenated word 0.0024305209999999997
word identities 0.002428235
clustering data 0.0023534000000000003
clustering algorithm 0.002197854
training data 0.001942409
such features 0.001932697
context feature 0.001907283
language words 0.001888387
phrase cluster 0.001782697
feature vector 0.001758894
features vectors 0.001757442
phrase clustering 0.001755547
data set 0.001738624
same clustering 0.0017229390000000002
following features 0.0016801199999999998
global features 0.0016727139999999998
gram features 0.0016635439999999999
distributional clustering 0.0016569750000000002
conventional features 0.001652672
unigram features 0.0016469569999999999
learning algorithm 0.001642309
lexicalized features 0.0016387309999999998
feature extraction 0.001633654
feature vectors 0.001613132
corresponding cluster 0.001601817
window cluster 0.001593338
hard clustering 0.001587985
feature values 0.00158066
english data 0.001578668
unlabeled data 0.001571921
clustering algorithms 0.001568861
brown clustering 0.0015584890000000001
input feature 0.001547123
phrasal cluster 0.001544042
current clustering 0.0015389380000000001
numerical feature 0.001531375
feature functions 0.0015280089999999999
cluster members 0.0015205449999999999
unsupervised data 0.00151842
soft cluster 0.001517839
cluster centroid 0.00151366
cluster assignments 0.0015076809999999999
cluster ids 0.001506582
cluster membership 0.001505077
feature templates 0.0015049339999999999
means clustering 0.0015040770000000001
supervised data 0.0014932069999999999
soft clustering 0.0014906890000000001
clustering tens 0.0014773310000000001
web data 0.001475678
labeled data 0.00147318
token words 0.001472802
training set 0.001470053
ner data 0.001427878
frequent words 0.001415504
training corpus 0.0014122079999999999
baseline algorithm 0.001410579
features 0.00140874
language learning 0.001407742
hyphenated words 0.0014050809999999999
different type 0.001391981
same information 0.0013856720000000001
lexical information 0.001373611
textual data 0.001372792
annotated data 0.001359624
conll data 0.001359599
phrase clusters 0.001338956
enough data 0.001337127
different problems 0.001327804
different senses 0.0013115800000000001
different instances 0.0013018980000000001
different types 0.001293116
different input 0.001292263
classification task 0.001284085
information extraction 0.001279867
cluster 0.00127506
different application 0.001274183
different applications 0.001271916
feature 0.00126443
brown algorithm 0.001260523
different granularities 0.001250199
clustering 0.00124791
different capitalization 0.001241064
different components 0.00124
mutual information 0.0012346940000000002
supervised training 0.0012246359999999999
training examples 0.001217337
labeled training 0.001204609
scalable algorithm 0.001182302
