tokenization rules 0.003167476
repp rules 0.002598313
complex rules 0.002523419
extra rules 0.002505477
subsequent rules 0.002497096
replacement rules 0.002465357
dozen rules 0.002456003
ness rules 0.002456003
rules 0.00223208
repp rule 0.002054663
rule sets 0.002041165
rule application 0.002037613
gle rule 0.0019495860000000001
ation rule 0.0019170410000000002
rule files 0.0019147830000000002
same text 0.0014467730000000002
language text 0.001372968
ptb tokenization 0.001352554
tokenization process 0.001311474
tokenization accuracy 0.00130681
repp tokenization 0.001301629
final tokenization 0.001292462
other words 0.001277889
treebank tokenization 0.001269433
tokenization methods 0.00125717
original text 0.001233386
tokenization conventions 0.001233325
current tokenization 0.00122717
tokenization errors 0.00122095
tokenization differences 0.001217138
gold tokenization 0.001196644
script tokenization 0.001189493
traceability tokenization 0.001168446
tokenization differing 0.001165865
tokenization discrepancies 0.001158911
alized tokenization 0.001158911
text spans 0.001145133
input text 0.0011437349999999999
source text 0.001128136
ptb data 0.001086974
different nlp 0.0010743369999999999
different conventions 0.001073658
task data 0.0010655950000000001
process data 0.0010458940000000001
journal text 0.0010451549999999999
raw text 0.001040325
data sets 0.001022551
text normalization 0.001019466
first character 0.001015096
wsj text 0.001010056
group reference 0.001005646
unchanged text 0.001001461
first phase 9.79134E-4
original string 9.69604E-4
other abbreviations 9.46866E-4
other aspects 9.36065E-4
tokenization 9.35396E-4
method sentences 9.269689999999999E-4
same effects 8.959090000000001E-4
extant data 8.92787E-4
full string 8.85971E-4
parsing work 8.84217E-4
syntactic analysis 8.69613E-4
ing repp 8.57751E-4
natural language 8.341570000000001E-4
capture group 8.300709999999999E-4
reference implementation 8.253900000000001E-4
original input 8.229590000000001E-4
syntactic annotation 7.9403E-4
levenshtein method 7.87634E-4
output tokens 7.824310000000001E-4
special cases 7.803599999999999E-4
token boundaries 7.77736E-4
use cases 7.73475E-4
pertinent information 7.72456E-4
regular expressions 7.71639E-4
token lattice 7.546370000000001E-4
large number 7.498520000000001E-4
ptb distribution 7.436960000000001E-4
syntactic ambiguity 7.41267E-4
group boundaries 7.40578E-4
token objects 7.40536E-4
regular expression 7.399220000000001E-4
token object 7.36984E-4
common methods 7.301569999999999E-4
separate tokens 7.2009E-4
ptb conventions 7.15087E-4
group references 7.13846E-4
sentences distance 7.09894E-4
common conventions 7.06312E-4
corner cases 7.05613E-4
problematic cases 7.02867E-4
repp framework 7.02863E-4
formal complexity 7.02704E-4
final period 7.020170000000001E-4
group invocation 7.0068E-4
group fires 7.0068E-4
capture groups 6.95077E-4
lar languages 6.93965E-4
textual form 6.9217E-4
