such features 0.00216198
cluster features 0.002054368
feature set 0.001893219
distributional features 0.001834473
orthographic features 0.0018131380000000002
features 0.00158799
baseline feature 0.0015765409999999999
training corpus 0.001551553
text corpus 0.00152519
training set 0.0015163350000000002
training data 0.001501615
information extraction 0.001487287
learning algorithm 0.001455757
data set 0.001437538
distinguishing feature 0.001405593
fault feature 0.001391764
entity data 0.001287517
different approach 0.0012863290000000001
test set 0.0012825100000000002
training documents 0.001270939
boundary learning 0.001251991
ing algorithm 0.001240281
cluster corpus 0.001227725
mutual information 0.001184172
unsupervised corpus 0.001180702
learning problem 0.001169388
feature 0.00116709
clustering corpus 0.001163592
single text 0.00116316
ner data 0.001161123
small training 0.001149646
supervised training 0.0011328430000000001
first name 0.001128623
machine learning 0.001127839
entity type 0.001126896
literal word 0.001125189
same document 0.00112028
news corpus 0.001116206
word tests 0.001115814
unlabeled text 0.001115376
information retrieval 0.0011119440000000001
corpus annotation 0.001107602
same test 0.001105228
supervised learning 0.001103144
set size 0.0011012160000000001
training time 0.001097326
corpus analysis 0.001096012
unlabeled set 0.001077662
name cluster 0.001071757
information objective 0.001069695
shannon information 0.001063347
tual information 0.001063347
substantial training 0.001060859
same term 0.001054064
semantic categories 0.001048263
entropy model 0.0010459660000000002
training wildcards 0.001045165
clustering approach 0.001044071
active learning 0.001043804
data sets 0.00103906
documents person 0.001025477
text fragment 0.001024862
little training 0.001022888
validation set 0.001016378
comparable training 0.001013996
training runs 0.001013996
journal corpus 0.001009401
learning framework 0.001008488
semantic dimension 9.97683E-4
count data 9.94567E-4
collins model 9.9417E-4
corpus yields 9.89121E-4
corpus vocabu 9.87969E-4
first names 9.87553E-4
entity fields 9.71967E-4
single document 9.7075E-4
set sizes 9.62224E-4
recognition performance 9.593410000000001E-4
same problem 9.577279999999999E-4
extraction time 9.563359999999999E-4
overall performance 9.49739E-4
extraction setting 9.489069999999999E-4
curve performance 9.47003E-4
ner system 9.46552E-4
same fields 9.44706E-4
name types 9.375399999999999E-4
boundary tokens 9.365250000000001E-4
proprietary data 9.355219999999999E-4
different labeling 9.32745E-4
pattern learner 9.32679E-4
initial boundary 9.24118E-4
related type 9.205579999999999E-4
algorithm let 9.204840000000001E-4
different levels 9.18686E-4
tion extraction 9.18096E-4
bwi models 9.1712E-4
entity types 9.08269E-4
different hypothesis 9.06076E-4
obvious approach 9.044089999999999E-4
port performance 9.038469999999999E-4
