topic model 0.00382943
language model 0.0036172799999999996
topic models 0.00296728
trigram model 0.002755396
language models 0.0027551299999999997
topic distribution 0.002621582
text word 0.002611056
code word 0.00251382
strong model 0.002497229
prediction model 0.002466467
lda model 0.0024511249999999997
model type 0.002412684
word tokens 0.002366862
model package 0.002362195
comment word 0.0022640539999999997
token word 0.002205655
topic mixture 0.002145213
different models 0.002142508
word extractor 0.002128011
word dictionary 0.002125203
model 0.00212125
word train 0.002102615
observed word 0.002092672
lda topic 0.002038055
joint topic 0.001996136
ing language 0.001993373
training data 0.0019366399999999999
topic associ 0.0019284789999999999
topic asso 0.001924715
cific topic 0.001924715
trigram models 0.001893246
text words 0.001836196
statistical language 0.001834043
natural language 0.001813406
language documents 0.001796871
language processing 0.001752323
code words 0.00173896
language queries 0.001719358
gramming language 0.001718684
language syntax 0.001717462
berkeley language 0.001711979
language caption 0.001711979
topic 0.00170818
ing data 0.001633323
same data 0.001624544
lda models 0.0015889749999999998
multiple models 0.001578748
document words 0.001571802
previous words 0.0015317019999999998
above models 0.001515426
guage models 0.001496678
language 0.00149603
average words 0.0014937549999999998
comment words 0.0014891939999999999
nlp models 0.001487574
data source 0.0014872499999999999
various data 0.001463449
data sources 0.001427369
text topics 0.001425343
background data 0.001361743
mixture distribution 0.001350435
likely words 0.0013174369999999999
different set 0.001276493
different comment 0.001272942
models 0.0012591
training dataset 0.0012505
text tokens 0.001228878
other comments 0.001222343
speech prediction 0.001210217
joint distribution 0.001201358
main training 0.00118798
document topics 0.001160949
other class 0.0011495770000000002
marginal distribution 0.001137617
code tokens 0.001131642
comment text 0.0011260699999999998
different sets 0.001124333
multiple training 0.001120308
different num 0.001114624
code document 0.001111442
words 0.00109966
training sources 0.001092049
training methodology 0.001058603
training sets 0.001041585
code completion 0.001033422
training datasets 0.001026113
standard code 0.001025049
third training 0.001024837
training sce 0.0010237290000000001
software code 0.001021883
source code 9.9057E-4
specific code 9.723869999999999E-4
case code 9.69535E-4
latent topics 9.66184E-4
document tokens 9.64484E-4
glish text 9.53979E-4
code documents 9.40141E-4
ing comments 9.38457E-4
distribution 9.13402E-4
results table 8.98892E-4
