training data 0.0031267400000000002
new data 0.002938172
data set 0.002922019
test data 0.002881474
data sets 0.0027399580000000002
travel data 0.002703528
annotated data 0.002663
internet data 0.0026567970000000002
data our 0.002609666
net data 0.0025926990000000004
different language 0.002396478
english word 0.00233764
such language 0.002285379
english words 0.001987936
language identification 0.001946957
natural language 0.001934205
language classification 0.001913921
language changes 0.0019083250000000002
language origin 0.0018872960000000001
base language 0.0018639610000000001
language idenfication 0.0018620560000000002
advanced language 0.0018538
language mix 0.001852361
language preference 0.001852361
german word 0.001809403
english web 0.001754339
word queries 0.001702666
language 0.00162957
german text 0.001613493
new text 0.001596092
word sense 0.001555102
english tokens 0.001527104
word forms 0.0015139749999999999
word computer 0.001504332
man word 0.001488653
web corpus 0.0014663469999999998
german words 0.001459699
word anbieter 0.001455375
english database 0.001454904
inflected word 0.001448571
word provider 0.001448571
word shapes 0.001448571
english lexicon 0.001432828
lookup system 0.001417612
english inclusions 0.001385013
full english 0.001384711
english lexi 0.001371842
english plural 0.001365134
english ones 0.001354576
english expressions 0.001349955
unsupervised system 0.001346154
english databases 0.00134227
english inclu 0.001338778
english loan 0.001337817
different languages 0.001295494
man text 0.0012927429999999998
current system 0.001292139
text categorisation 0.0012564149999999999
tokenised text 0.001251907
standard corpus 0.001242257
efficient system 0.001237377
other tokens 0.001236134
different feature 0.0012347389999999999
small training 0.00122886
german web 0.001226102
system description 0.001221308
foreign words 0.001161759
english 0.00111323
eign words 0.00110569
newspaper corpus 0.001105043
loan words 0.001099293
new languages 0.0010961780000000002
overall corpus 0.001093614
different domains 0.0010747209999999998
our corpus 0.001064324
markov model 0.001060991
other hand 0.001055322
corpus negra 0.001048797
annotated training 0.00104858
other parts 0.001045788
other sectors 0.001045788
different use 0.0010420899999999999
model tagger 0.001029149
web documents 0.0010290639999999999
feature set 0.00101927
trained model 0.001002497
different semantics 0.001000719
new material 9.98541E-4
same domain 9.92665E-4
machine translation 9.82413E-4
system 9.79281E-4
standard set 9.684580000000001E-4
wide web 9.61917E-4
lookup results 9.44093E-4
morphological analysis 9.34486E-4
domain experiments 9.25649E-4
german texts 9.23971E-4
valuable information 9.18986E-4
such homographs 9.16512E-4
additional feature 9.1088E-4
