language model 0.002958911
training data 0.00291154
segmentation model 0.002413096
markov model 0.002374604
first data 0.002369795
segment model 0.002321918
data set 0.00231326
test data 0.002308251
transition model 0.00229198
ing data 0.002271653
model the 0.00225624
annotated data 0.002222029
guage model 0.002221836
tation model 0.002220868
unconstrained model 0.002209566
abilistic model 0.002209186
data sources 0.00218073
textual data 0.002137192
data sets 0.0021247040000000003
actual data 0.002124137
independent data 0.002109019
pisces data 0.00210779
data our 0.002102633
amphibians data 0.002089858
model 0.00199795
database field 0.001754153
input text 0.001641385
word search 0.001609359
text document 0.001605984
many text 0.001547305
database records 0.0014906300000000002
text classifier 0.001489048
supervised training 0.001477012
database record 0.001470572
key word 0.001460287
text types 0.001459569
text documents 0.0014487419999999998
unsupervised training 0.0014229429999999999
language models 0.001413765
nal word 0.001385733
annotated training 0.001383549
field label 0.001369926
label field 0.001369926
learning field 0.001369453
raw text 0.0013666799999999999
similar information 0.00136519
free text 0.0013603539999999998
text docu 0.001357533
many database 0.001353786
training procedure 0.001350184
label other 0.001336523
database matching 0.001331943
database fields 0.0013177290000000001
training sequences 0.001316222
simple database 0.0012960900000000002
training sets 0.0012862239999999999
generated training 0.0012817759999999999
ferent training 0.001280577
realistic training 0.0012743009999999998
labelled training 0.001272414
language modelling 0.001266898
numeric features 0.0012653550000000001
structured database 0.001253706
language processing 0.0012534479999999999
artificial training 0.001252953
training mate 0.0012525519999999999
corresponding database 0.00125099
perfect training 0.0012502059999999998
example information 0.001244695
information extraction 0.001241173
orthographic features 0.001237597
original database 0.0012245630000000001
field segmentation 0.001220728
single field 0.0012131989999999999
natural language 0.001211404
segment information 0.00121072
database columns 0.001198117
database entries 0.001192494
bigram language 0.001190801
first approach 0.001189966
field type 0.001189652
different labels 0.0011875409999999999
probabilistic language 0.001187353
gram language 0.001184941
database lookup 0.001181585
specimen database 0.00118089
database corre 0.001179767
database row 0.001175969
sri language 0.001172647
database cells 0.001167803
field sequence 0.001166068
database cell 0.0011642570000000001
amphibians database 0.001163419
sophisticated database 0.001163019
other approaches 0.001158402
annotated field 0.001152601
approach models 0.001147985
field book 0.0011352229999999999
related information 0.0011322419999999999
species information 0.001131566
