word corpus 0.002585097
corpus web 0.002322268
newspaper corpus 0.002240837
large corpus 0.002227512
corpus evaluation 0.0021688790000000003
national corpus 0.002164367
corpus extracts 0.00216345
corpus processing 0.002151369
particular corpus 0.002135619
corpus release 0.002077996
corpus chunk 0.002077092
enpc corpus 0.002075117
corpus resource 0.002068293
cetempúblico corpus 0.0020487640000000002
corpus article 0.0020474350000000002
final corpus 0.002043736
corpus contents 0.0020423810000000002
corpus 0.00182433
other corpora 0.001376876
such information 0.00136181
newspaper language 0.001354411
different corpora 0.001331684
portuguese language 0.0012985470000000002
other cases 0.001271082
similar information 0.001183286
different ones 0.0011827209999999999
word newspaper 0.0011772739999999999
foreign words 0.0011772129999999999
text source 0.001173689
language engineering 0.001157026
test data 0.001154227
other problems 0.001154183
different markup 0.001149139
different way 0.001143725
sentence tags 0.001139021
other newspapers 0.0011267159999999998
portuguese word 0.00112141
newspaper text 0.0010970939999999998
other consequences 0.001092827
other objects 0.001092827
other sports 0.001092827
other people 0.001092827
different categories 0.001082322
different strategies 0.001070464
sentences sentence 0.001059729
kind different 0.0010506089999999999
different tokenizers 0.0010476539999999999
much information 0.00103881
evaluation data 0.001028623
original text 0.001027105
single text 0.001014969
accompanying information 0.00100977
full sentence 0.001008719
single sentence 9.743359999999999E-4
same classification 9.72662E-4
sentence size 9.70595E-4
unrecognized words 9.617790000000001E-4
text format 9.566329999999999E-4
text files 9.525079999999999E-4
hyphenated words 9.43907E-4
words subcorpus 9.43907E-4
sentence separation 9.42888E-4
special text 9.40085E-4
language 9.37904E-4
text chunk 9.33349E-4
possible tags 9.235439999999999E-4
newspaper corpora 9.17395E-4
soccer results 9.12213E-4
same grounds 9.00311E-4
same checksum 9.00311E-4
text formats 8.97231E-4
preliminary results 8.92547E-4
definite results 8.912410000000001E-4
sentence boundary 8.910109999999999E-4
other 8.75988E-4
noun thigh 8.68563E-4
several paragraphs 8.59761E-4
several patches 8.5752E-4
refining sentence 8.56416E-4
distribution process 8.38972E-4
different 8.30796E-4
separation tags 8.020009999999999E-4
author tags 8.019139999999999E-4
first names 8.007839999999999E-4
only source 7.99077E-4
foreign spelling 7.96857E-4
information 7.91362E-4
hash table 7.91305E-4
size distribution 7.878150000000001E-4
source classification 7.82741E-4
portuguese newspaper 7.7715E-4
extract structure 7.70553E-4
newspaper section 7.65449E-4
original newspaper 7.63025E-4
small sentences 7.61688E-4
many differences 7.59933E-4
first copies 7.595309999999999E-4
first thing 7.595309999999999E-4
first comment 7.595309999999999E-4
actual corpora 7.58862E-4
