character features 0.0032015560000000004
additional features 0.002777753
features tokens 0.002731199
srn features 0.0026987110000000003
category features 0.0026659680000000003
nice features 0.0025975440000000002
different feature 0.00259564
features 0.00236986
feature learning 0.002173326
accuracy word 0.002080045
single word 0.002074149
feature combination 0.002038955
word tokenization 0.002035493
unsupervised feature 0.002005982
basic feature 0.001994275
word embeddings 0.001986787
feature sets 0.001986491
typographic word 0.001967053
feature engineering 0.001945995
ual feature 0.001892226
training data 0.001847205
test data 0.0016976259999999998
feature 0.00166427
language model 0.001630954
text segmentation 0.0016163230000000002
segmentation model 0.001529577
hyphenated words 0.001503361
different context 0.0014655190000000002
italian data 0.001426157
original text 0.001382205
dutch data 0.0013792499999999998
data sets 0.0013601709999999999
unlabeled text 0.001351219
raw text 0.001338912
test set 0.001325521
text embeddings 0.001306077
tokenized text 0.001302907
text windows 0.001299348
text representations 0.001288541
sentence segmentation 0.0012882890000000002
mented text 0.001288239
words 0.00127415
different languages 0.001239329
current character 0.001235458
different window 0.0012335240000000002
character level 0.001229741
different combinations 0.0012187460000000002
next character 0.001193446
different sizes 0.001178369
news corpus 0.001167832
different domains 0.001165831
previous character 0.001157723
sentence boundary 0.001155878
character sequences 0.001138996
character strings 0.001132407
error rate 0.00112626
unicode character 0.001111973
focus character 0.00109548
character codes 0.001080758
error rates 0.001072718
character code 0.001070159
potential sentence 0.001055255
english test 0.001052172
training phase 0.001040301
capitalization information 0.0010308140000000001
same method 0.001020322
other approaches 0.001016681
segmentation task 0.001015654
rules language 0.001008134
art sentence 9.99015E-4
language domain 9.96491E-4
other steps 9.91905E-4
srn language 9.88031E-4
sentence boundaries 9.84228E-4
tagging method 9.79401E-4
final test 9.78695E-4
gmb corpus 9.76224E-4
model 9.71774E-4
detection system 9.6756E-4
learning task 9.66907E-4
sentence segmenta 9.6323E-4
sentence splitters 9.584390000000001E-4
language processing 9.558419999999999E-4
similar characters 9.550909999999999E-4
development set 9.4915E-4
natural language 9.34088E-4
first row 9.32418E-4
many errors 9.171310000000001E-4
learning methods 9.16941E-4
exact test 9.15413E-4
entropy models 9.13318E-4
final results 9.09256E-4
high performance 8.940160000000001E-4
single characters 8.9141E-4
context size 8.89102E-4
binomial test 8.87267E-4
sentences tokens 8.79099E-4
previous models 8.73906E-4
future work 8.61331E-4
important problem 8.60775E-4
