COLLINS

Sentences identical: 230
Not identical but same number of n-gram overlaps (original used): 233
Original preferred based on overlap of
- 4-grams: 324
- trigrams: 108
- bigrams: 107
- unigrams: 86
( total: 625 )
Reordered preferred based on overlap of
- 4-grams: 477
- trigrams: 150
- bigrams: 150
- unigrams: 135
( total: 912 )


COLLINS-HALF

Sentences identical: 246
Not identical but same number of n-gram overlaps (original used): 238
Original preferred based on overlap of
- 4-grams: 314
- trigrams: 109
- bigrams: 106
- unigrams: 95
( total: 624 )
Reordered preferred based on overlap of
- 4-grams: 467
- trigrams: 140
- bigrams: 141
- unigrams: 144
( total: 892 )


WMT09

Sentences identical: 203
Not identical but same number of n-gram overlaps (original used): 227
Original preferred based on overlap of
- 4-grams: 424
- trigrams: 124
- bigrams: 112
- unigrams: 86
( total: 746 )
Reordered preferred based on overlap of
- 4-grams: 494
- trigrams: 108
- bigrams: 120
- unigrams: 102
( total: 824 )


WMT09-DIST

Sentences identical: 186
Not identical but same number of n-gram overlaps (original used): 212
Original preferred based on overlap of
- 4-grams: 355
- trigrams: 113
- bigrams: 112
- unigrams: 93
( total: 673 )
Reordered preferred based on overlap of
- 4-grams: 548
- trigrams: 141
- bigrams: 137
- unigrams: 103
( total: 929 )


WMT09-DIST-NEWSTUNE

Sentences identical: 134
Not identical but same number of n-gram overlaps (original used): 221
Original preferred based on overlap of
- 4-grams: 433
- trigrams: 159
- bigrams: 131
- unigrams: 107
( total: 830 )
Reordered preferred based on overlap of
- 4-grams: 489
- trigrams: 114
- bigrams: 111
- unigrams: 101
( total: 815 )


WMT09-DIST-NOTUNE

Sentences identical: 259
Not identical but same number of n-gram overlaps (original used): 206
Original preferred based on overlap of
- 4-grams: 329
- trigrams: 121
- bigrams: 104
- unigrams: 82
( total: 636 )
Reordered preferred based on overlap of
- 4-grams: 474
- trigrams: 138
- bigrams: 144
- unigrams: 143
( total: 899 )


WMT09-NEWSTUNE

Sentences identical: 210
Not identical but same number of n-gram overlaps (original used): 245
Original preferred based on overlap of
- 4-grams: 375
- trigrams: 122
- bigrams: 103
- unigrams: 102
( total: 702 )
Reordered preferred based on overlap of
- 4-grams: 484
- trigrams: 129
- bigrams: 127
- unigrams: 103
( total: 843 )


WMT09-NOTUNE

Sentences identical: 252
Not identical but same number of n-gram overlaps (original used): 220
Original preferred based on overlap of
- 4-grams: 349
- trigrams: 120
- bigrams: 111
- unigrams: 82
( total: 662 )
Reordered preferred based on overlap of
- 4-grams: 450
- trigrams: 137
- bigrams: 147
- unigrams: 132
( total: 866 )


WMT09-5

Sentences identical: 203
Not identical but same number of n-gram overlaps (original used): 208
Original preferred based on overlap of
- 4-grams: 435
- trigrams: 131
- bigrams: 95
- unigrams: 88
( total: 749 )
Reordered preferred based on overlap of
- 4-grams: 491
- trigrams: 113
- bigrams: 125
- unigrams: 111
( total: 840 )


WMT09-DIST-5

Sentences identical: 207
Not identical but same number of n-gram overlaps (original used): 200
Original preferred based on overlap of
- 4-grams: 409
- trigrams: 122
- bigrams: 117
- unigrams: 111
( total: 759 )
Reordered preferred based on overlap of
- 4-grams: 519
- trigrams: 117
- bigrams: 108
- unigrams: 90
( total: 834 )


WMT09-DIST-NEWSTUNE-5

Sentences identical: 199
Not identical but same number of n-gram overlaps (original used): 235
Original preferred based on overlap of
- 4-grams: 347
- trigrams: 114
- bigrams: 101
- unigrams: 94
( total: 656 )
Reordered preferred based on overlap of
- 4-grams: 547
- trigrams: 137
- bigrams: 118
- unigrams: 108
( total: 910 )


WMT09-DIST-NOTUNE-5

Sentences identical: 271
Not identical but same number of n-gram overlaps (original used): 197
Original preferred based on overlap of
- 4-grams: 338
- trigrams: 126
- bigrams: 111
- unigrams: 79
( total: 654 )
Reordered preferred based on overlap of
- 4-grams: 478
- trigrams: 143
- bigrams: 126
- unigrams: 131
( total: 878 )


WMT09-NEWSTUNE-5

Sentences identical: 171
Not identical but same number of n-gram overlaps (original used): 214
Original preferred based on overlap of
- 4-grams: 424
- trigrams: 133
- bigrams: 92
- unigrams: 101
( total: 750 )
Reordered preferred based on overlap of
- 4-grams: 490
- trigrams: 134
- bigrams: 141
- unigrams: 100
( total: 865 )


WMT09-NOTUNE-5

Sentences identical: 268
Not identical but same number of n-gram overlaps (original used): 225
Original preferred based on overlap of
- 4-grams: 378
- trigrams: 125
- bigrams: 106
- unigrams: 72
( total: 681 )
Reordered preferred based on overlap of
- 4-grams: 448
- trigrams: 130
- bigrams: 123
- unigrams: 125
( total: 826 )


WMT09-HALF

Sentences identical: 215
Not identical but same number of n-gram overlaps (original used): 226
Original preferred based on overlap of
- 4-grams: 422
- trigrams: 112
- bigrams: 117
- unigrams: 98
( total: 749 )
Reordered preferred based on overlap of
- 4-grams: 450
- trigrams: 126
- bigrams: 136
- unigrams: 98
( total: 810 )


WMT09-DIST-HALF

Sentences identical: 181
Not identical but same number of n-gram overlaps (original used): 221
Original preferred based on overlap of
- 4-grams: 372
- trigrams: 105
- bigrams: 105
- unigrams: 102
( total: 684 )
Reordered preferred based on overlap of
- 4-grams: 518
- trigrams: 162
- bigrams: 134
- unigrams: 100
( total: 914 )


WMT09-DIST-NOTUNE-HALF

Sentences identical: 262
Not identical but same number of n-gram overlaps (original used): 229
Original preferred based on overlap of
- 4-grams: 330
- trigrams: 112
- bigrams: 105
- unigrams: 86
( total: 633 )
Reordered preferred based on overlap of
- 4-grams: 442
- trigrams: 159
- bigrams: 144
- unigrams: 131
( total: 876 )


WMT09-NOTUNE-HALF

Sentences identical: 249
Not identical but same number of n-gram overlaps (original used): 251
Original preferred based on overlap of
- 4-grams: 344
- trigrams: 110
- bigrams: 116
- unigrams: 92
( total: 662 )
Reordered preferred based on overlap of
- 4-grams: 445
- trigrams: 140
- bigrams: 139
- unigrams: 114
( total: 838 )


WMT09-25PC

Sentences identical: 236
Not identical but same number of n-gram overlaps (original used): 209
Original preferred based on overlap of
- 4-grams: 436
- trigrams: 115
- bigrams: 103
- unigrams: 101
( total: 755 )
Reordered preferred based on overlap of
- 4-grams: 451
- trigrams: 113
- bigrams: 137
- unigrams: 99
( total: 800 )


WMT09-DIST-25PC

Sentences identical: 193
Not identical but same number of n-gram overlaps (original used): 198
Original preferred based on overlap of
- 4-grams: 408
- trigrams: 110
- bigrams: 114
- unigrams: 98
( total: 730 )
Reordered preferred based on overlap of
- 4-grams: 523
- trigrams: 137
- bigrams: 131
- unigrams: 88
( total: 879 )


WMT09-DIST-NOTUNE-25PC

Sentences identical: 290
Not identical but same number of n-gram overlaps (original used): 206
Original preferred based on overlap of
- 4-grams: 353
- trigrams: 106
- bigrams: 117
- unigrams: 85
( total: 661 )
Reordered preferred based on overlap of
- 4-grams: 429
- trigrams: 143
- bigrams: 138
- unigrams: 133
( total: 843 )


WMT09-NOTUNE-25PC

Sentences identical: 268
Not identical but same number of n-gram overlaps (original used): 223
Original preferred based on overlap of
- 4-grams: 393
- trigrams: 110
- bigrams: 99
- unigrams: 94
( total: 696 )
Reordered preferred based on overlap of
- 4-grams: 424
- trigrams: 127
- bigrams: 125
- unigrams: 137
( total: 813 )


WMT09-10PC

Sentences identical: 207
Not identical but same number of n-gram overlaps (original used): 234
Original preferred based on overlap of
- 4-grams: 420
- trigrams: 114
- bigrams: 100
- unigrams: 99
( total: 733 )
Reordered preferred based on overlap of
- 4-grams: 448
- trigrams: 149
- bigrams: 121
- unigrams: 108
( total: 826 )


WMT09-DIST-10PC

Sentences identical: 175
Not identical but same number of n-gram overlaps (original used): 202
Original preferred based on overlap of
- 4-grams: 417
- trigrams: 113
- bigrams: 97
- unigrams: 105
( total: 732 )
Reordered preferred based on overlap of
- 4-grams: 510
- trigrams: 152
- bigrams: 138
- unigrams: 91
( total: 891 )


WMT09-DIST-NOTUNE-10PC

Sentences identical: 243
Not identical but same number of n-gram overlaps (original used): 220
Original preferred based on overlap of
- 4-grams: 356
- trigrams: 118
- bigrams: 112
- unigrams: 89
( total: 675 )
Reordered preferred based on overlap of
- 4-grams: 425
- trigrams: 142
- bigrams: 150
- unigrams: 145
( total: 862 )


WMT09-NOTUNE-10PC

Sentences identical: 250
Not identical but same number of n-gram overlaps (original used): 228
Original preferred based on overlap of
- 4-grams: 415
- trigrams: 106
- bigrams: 111
- unigrams: 88
( total: 720 )
Reordered preferred based on overlap of
- 4-grams: 410
- trigrams: 129
- bigrams: 134
- unigrams: 129
( total: 802 )


WMT09 NEWS TEST

Sentences identical: 254
Not identical but same number of n-gram overlaps (original used): 437
Original preferred based on overlap of
- 4-grams: 348
- trigrams: 199
- bigrams: 252
- unigrams: 187
( total: 986 )
Reordered preferred based on overlap of
- 4-grams: 291
- trigrams: 170
- bigrams: 217
- unigrams: 170
( total: 848 )


WMT09-DIST NEWS TEST

Sentences identical: 294
Not identical but same number of n-gram overlaps (original used): 456
Original preferred based on overlap of
- 4-grams: 283
- trigrams: 201
- bigrams: 221
- unigrams: 216
( total: 921 )
Reordered preferred based on overlap of
- 4-grams: 298
- trigrams: 159
- bigrams: 208
- unigrams: 189
( total: 854 )


WMT09-DIST-NEWSTUNE NEWS TEST

Sentences identical: 213
Not identical but same number of n-gram overlaps (original used): 481
Original preferred based on overlap of
- 4-grams: 349
- trigrams: 242
- bigrams: 252
- unigrams: 164
( total: 1007 )
Reordered preferred based on overlap of
- 4-grams: 242
- trigrams: 153
- bigrams: 200
- unigrams: 229
( total: 824 )


WMT09-DIST-NOTUNE NEWS TEST

Sentences identical: 526
Not identical but same number of n-gram overlaps (original used): 534
Original preferred based on overlap of
- 4-grams: 235
- trigrams: 158
- bigrams: 192
- unigrams: 168
( total: 753 )
Reordered preferred based on overlap of
- 4-grams: 157
- trigrams: 116
- bigrams: 181
- unigrams: 258
( total: 712 )


WMT09-NEWSTUNE NEWS TEST

Sentences identical: 294
Not identical but same number of n-gram overlaps (original used): 549
Original preferred based on overlap of
- 4-grams: 345
- trigrams: 204
- bigrams: 245
- unigrams: 174
( total: 968 )
Reordered preferred based on overlap of
- 4-grams: 204
- trigrams: 141
- bigrams: 175
- unigrams: 194
( total: 714 )


WMT09-NOTUNE NEWS TEST

Sentences identical: 404
Not identical but same number of n-gram overlaps (original used): 458
Original preferred based on overlap of
- 4-grams: 278
- trigrams: 197
- bigrams: 232
- unigrams: 195
( total: 902 )
Reordered preferred based on overlap of
- 4-grams: 196
- trigrams: 159
- bigrams: 170
- unigrams: 236
( total: 761 )


WMT09-5 NEWS TEST

Sentences identical: 229
Not identical but same number of n-gram overlaps (original used): 448
Original preferred based on overlap of
- 4-grams: 345
- trigrams: 195
- bigrams: 203
- unigrams: 145
( total: 888 )
Reordered preferred based on overlap of
- 4-grams: 342
- trigrams: 170
- bigrams: 236
- unigrams: 212
( total: 960 )


WMT09-DIST-5 NEWS TEST

Sentences identical: 313
Not identical but same number of n-gram overlaps (original used): 474
Original preferred based on overlap of
- 4-grams: 308
- trigrams: 205
- bigrams: 222
- unigrams: 204
( total: 939 )
Reordered preferred based on overlap of
- 4-grams: 266
- trigrams: 164
- bigrams: 206
- unigrams: 163
( total: 799 )


WMT09-DIST-NEWSTUNE-5 NEWS TEST

Sentences identical: 426
Not identical but same number of n-gram overlaps (original used): 548
Original preferred based on overlap of
- 4-grams: 253
- trigrams: 173
- bigrams: 224
- unigrams: 213
( total: 863 )
Reordered preferred based on overlap of
- 4-grams: 197
- trigrams: 123
- bigrams: 169
- unigrams: 199
( total: 688 )


WMT09-DIST-NOTUNE-5 NEWS TEST

Sentences identical: 520
Not identical but same number of n-gram overlaps (original used): 519
Original preferred based on overlap of
- 4-grams: 258
- trigrams: 177
- bigrams: 207
- unigrams: 159
( total: 801 )
Reordered preferred based on overlap of
- 4-grams: 158
- trigrams: 119
- bigrams: 171
- unigrams: 237
( total: 685 )


WMT09-NEWSTUNE-5 NEWS TEST

Sentences identical: 225
Not identical but same number of n-gram overlaps (original used): 453
Original preferred based on overlap of
- 4-grams: 388
- trigrams: 222
- bigrams: 250
- unigrams: 188
( total: 1048 )
Reordered preferred based on overlap of
- 4-grams: 272
- trigrams: 152
- bigrams: 176
- unigrams: 199
( total: 799 )


WMT09-NOTUNE-5 NEWS TEST

Sentences identical: 382
Not identical but same number of n-gram overlaps (original used): 472
Original preferred based on overlap of
- 4-grams: 310
- trigrams: 213
- bigrams: 234
- unigrams: 172
( total: 929 )
Reordered preferred based on overlap of
- 4-grams: 202
- trigrams: 147
- bigrams: 188
- unigrams: 205
( total: 742 )
