language model 0.0024030730000000004
text data 0.00221616
tagging model 0.0019264430000000001
crf model 0.0018786330000000002
standard word 0.0018548800000000002
probabilistic model 0.001848126
network model 0.001843529
source model 0.0018184240000000001
unified model 0.0018151920000000002
word case 0.001794517
channel model 0.001791494
classification model 0.0017885980000000002
specialized model 0.001775927
first word 0.001670009
same data 0.001631868
text normalization 0.001630317
data corpus 0.001567463
model 0.00155169
word level 0.001547479
language models 0.001528387
word correction 0.0014990889999999999
uppercase word 0.001488404
word amc 0.001480735
large data 0.0014747200000000001
word casing 0.001472948
misspelled word 0.0014717530000000001
word misspelled 0.0014717530000000001
data sets 0.001461887
email data 0.0014551590000000001
text figure 0.001411582
setting data 0.001406539
labeled data 0.001402847
data cleaning 0.0013957230000000002
data conversion 0.001369954
input text 0.0013495719999999998
informal text 0.0013391459999999998
learning models 0.0013357540000000002
raw text 0.001311561
standard words 0.00130561
text normali 0.0012902459999999999
state features 0.0012822979999999999
sentence sentence 0.001255692
same features 0.001237071
state feature 0.001215229
language texts 0.001195308
token sequence 0.001168789
natural language 0.0011668520000000001
language processing 0.001130675
normalization method 0.001106877
language modeling 0.001102587
token table 0.001084056
transition features 0.001080557
language proc 0.001073514
feature value 0.0010694329999999998
token type 0.001027743
transition feature 0.0010134879999999999
small set 0.001012678
token number 0.001008695
tagging approach 0.001006519
different types 0.001003974
different tokens 0.001002048
normalization problem 9.99555E-4
binary features 9.91301E-4
words calculation 9.78639E-4
different techniques 9.66671E-4
special words 9.65381E-4
efficient algorithm 9.62043E-4
token detection 9.57875E-4
sentence boundary 9.566500000000001E-4
token deletion 9.502029999999999E-4
random fields 9.45354E-4
different levels 9.41489E-4
experiments results 9.296129999999999E-4
sentence level 9.280250000000001E-4
token del 9.26358E-4
confusing words 9.25932E-4
machine learning 9.235970000000001E-4
dependent models 9.23532E-4
independent models 9.19647E-4
ssary token 9.19624E-4
token bounda 9.19624E-4
local information 9.16222E-4
normalization experiments 9.141889999999999E-4
ent approach 9.04194E-4
sentence boundaries 9.025070000000001E-4
specialized models 9.01241E-4
caded models 8.99872E-4
ing punctuation 8.95588E-4
unified approach 8.95268E-4
possible tag 8.936829999999999E-4
modeling approach 8.8297E-4
dependent approach 8.78294E-4
independent approach 8.744089999999999E-4
viterbi algorithm 8.7016E-4
experimental results 8.69133E-4
tag sequence 8.687219999999999E-4
classification approach 8.686739999999999E-4
iterative algorithm 8.683159999999999E-4
previous work 8.65916E-4
cascaded approach 8.64678E-4
