text data 0.0039140600000000005
training data 0.003077387
language model 0.0030115100000000002
set data 0.002965253
data set 0.002965253
such data 0.0029192790000000003
language text 0.00284872
input data 0.002827987
source data 0.002788628
data source 0.002788628
annotated data 0.0027659380000000003
ace data 0.0027656150000000003
specific data 0.0027511000000000002
real data 0.002729135
labeled data 0.002714024
data sets 0.002702208
data average 0.0027004890000000004
data mining 0.002682737
transactions data 0.0026734000000000003
noisy data 0.002655801
clean data 0.0026531600000000003
data genres 0.002646217
data formats 0.002640896
unstructured data 0.0026262610000000004
transaction data 0.002624023
quality data 0.0026222240000000003
data genre 0.002609488
opment data 0.002608917
cient data 0.002608917
unconstrained data 0.002608917
data imperfections 0.002608917
model training 0.002327937
training text 0.002165147
english text 0.00211276
single model 0.00203122
detection model 0.001985335
baseline model 0.001981773
markov model 0.0019655569999999997
input text 0.001915747
text input 0.001915747
combination model 0.001895721
text source 0.001876388
guage model 0.001872688
appropriate model 0.001862247
language classifier 0.0018436810000000002
spanish text 0.001815233
text sources 0.001768145
noisy text 0.001743561
clean text 0.00174092
ful text 0.001729408
target language 0.001726288
text con 0.0017176490000000001
language material 0.001717531
unknown text 0.001707291
plain text 0.001700871
text cau 0.001697533
text conventions 0.001697533
model 0.0016637
language content 0.001637685
language classi 0.001546312
predetermined language 0.001543811
mary language 0.001543811
high system 0.001475143
other models 0.0014326410000000001
tion system 0.001423693
baseline system 0.0014113630000000001
other languages 0.00139211
system output 0.001373155
processing system 0.00136733
original system 0.001364173
language 0.00134781
line system 0.001342331
gazetteer system 0.001338859
recognition system 0.001334083
clean system 0.0013333
multilingual system 0.001329802
system combination 0.001325311
combination system 0.001325311
other features 0.001314249
system look 0.001305912
perimental system 0.001297918
memm system 0.00129295
english training 0.001276087
additional information 0.00126151
untranslatable word 0.001239946
other source 0.001218614
training set 0.00121634
other types 0.001214799
other material 0.001212857
various information 0.001177553
other classifiers 0.001168259
different research 0.001161682
english test 0.001156775
good english 0.001147131
information retrieval 0.001137762
information fields 0.001121195
gazetteer information 0.001111278
test set 0.001097028
system 0.00109329
research problem 0.0010871750000000001
