other features 0.002021824
training documents 0.0019266560000000001
data training 0.001905469
training data 0.001905469
different level 0.0018817859999999999
test documents 0.0018577200000000002
test data 0.001836533
category level 0.001833529
level category 0.001833529
training corpus 0.001769252
test corpus 0.001700316
training set 0.001656103
feature selection 0.00165351
word selection 0.0015970799999999999
negative features 0.001591529
additional feature 0.001589484
new level 0.0015859350000000001
level number 0.001562389
feature sets 0.001546577
same test 0.001518845
feature inc 0.001515183
categorization algorithm 0.001488901
level values 0.001482489
level categories 0.0014715309999999999
same corpus 0.001466929
only features 0.001448305
other categories 0.001407915
mon features 0.001394148
category hierarchy 0.0013884280000000001
correct documents 0.0013851380000000002
ument features 0.001358615
ter features 0.001357013
features associ 0.001356549
single training 0.001344231
corporate documents 0.0013246120000000002
first test 0.00132347
test document 0.001314758
negative words 0.001313132
selection algorithm 0.0013113
second test 0.001308438
different levels 0.0013035899999999999
text categorization 0.0013008910000000002
test corpora 0.001297534
level modification 0.00129601
next test 0.001271164
level numbers 0.001267428
level num 0.001263814
document corpus 0.001262842
own documents 0.0012579520000000001
unclassified documents 0.001256773
significant words 0.001256255
category levels 0.001255333
single category 0.001254318
earnings documents 0.0012513090000000001
root level 0.001249453
acquisitions documents 0.001247681
ciate documents 0.0012388800000000001
language processing 0.001234476
tions documents 0.0012318350000000001
high frequency 0.001230455
appropriate level 0.001229783
porate documents 0.001229319
multicategory documents 0.001229319
such words 0.0012280470000000001
feature 0.00122791
specific category 0.001227744
overall algorithm 0.0012077799999999999
training docu 0.0012056710000000002
other branches 0.001201456
algorithm the 0.00120069
value precision 0.0011995439999999999
corporate category 0.001198147
original algorithm 0.001195448
test sets 0.001194783
other hand 0.0011929520000000002
particular category 0.001186052
algorithm description 0.001181184
other adjustments 0.0011729610000000001
similar results 0.001171009
different cate 0.001168147
test files 0.0011619780000000001
action algorithm 0.001159396
high precision 0.0011482329999999998
reuters corpus 0.001146008
new hierarchy 0.001140834
test doc 0.001140175
test docu 0.001136735
leaf category 0.001136399
categorization system 0.001130259
earnings category 0.001124844
interior category 0.001112854
relative frequency 0.001110758
features 0.00110705
category ranks 0.00110508
category cpi 0.001102709
sitions category 0.001102709
category bins 0.001102709
large number 0.001100634
natural language 0.001093134
categorization performance 0.0010890420000000001
