visual features 0.002584505
html features 0.002431252
textual features 0.002317365
document features 0.002272658
layout features 0.00211978
tual features 0.0021059670000000003
information extraction 0.001930567
features 0.00188966
kernel model 0.001658263
space information 0.001653666
size information 0.001642374
textual information 0.001639915
relevant information 0.001609027
feature name 0.001596353
ing information 0.001592967
spatial information 0.001560458
feature space 0.001538316
important information 0.0015352130000000001
listing information 0.001534588
format information 0.001480417
address information 0.001468444
layout information 0.00144233
author information 0.00142892
contact information 0.001428647
tial information 0.001428441
dated information 0.001427243
data extraction 0.001403068
model 0.00131845
information 0.00121221
word attributes 0.001156532
visual approach 0.0011478
tion extraction 0.001101817
results results 0.001101052
feature 0.00109686
entity recognition 0.001071878
semantic level 0.0010603560000000001
state data 0.001036581
word window 0.001030041
entity boundaries 0.001025645
title extraction 0.001022306
different media 0.00102115
word frequencies 0.001012515
text node 0.001006087
extraction techniques 0.001004652
learning approach 0.001002452
html structure 9.66687E-4
space type 9.61054E-4
visual elements 9.50772E-4
visual characteristics 9.483670000000001E-4
text elements 9.337929999999999E-4
traditional set 9.29213E-4
performance gain 9.229010000000001E-4
performance differences 9.220890000000001E-4
common html 9.21881E-4
text seg 9.209489999999999E-4
performance gains 9.20888E-4
color space 9.20042E-4
classifier performance 9.15806E-4
visual clues 9.15482E-4
visual consistency 9.15392E-4
visual prominence 9.10016E-4
text chunks 8.952229999999999E-4
sifier performance 8.88594E-4
html style 8.843970000000001E-4
dom tree 8.79476E-4
html documents 8.79461E-4
space size 8.7162E-4
classification task 8.67299E-4
comparison set 8.65932E-4
content tree 8.618670000000001E-4
size size 8.60328E-4
dard set 8.57293E-4
first page 8.55255E-4
learning method 8.55113E-4
same time 8.51877E-4
machine learning 8.49245E-4
html dom 8.34986E-4
white space 8.2741E-4
relevant size 8.269810000000001E-4
file system 8.250530000000001E-4
ner task 8.23914E-4
ing space 8.22213E-4
web pages 8.13617E-4
name description 8.13145E-4
html parser 8.09295E-4
basic color 8.05917E-4
common structure 8.05384E-4
problem definition 8.035E-4
html pages 8.010960000000001E-4
related work 7.97628E-4
full name 7.93906E-4
computed html 7.933580000000001E-4
space types 7.8731E-4
entity 7.86469E-4
new version 7.8556E-4
essential list 7.84632E-4
textual representation 7.84024E-4
html resources 7.831650000000001E-4
local file 7.81452E-4
phone number 7.79273E-4
