@inproceedings{ishihara-2020-influence,
    title = "The Influence of Background Data Size on the Performance of a Score-based Likelihood Ratio System: A Case of Forensic Text Comparison",
    author = "Ishihara, Shunichi",
    editor = "Kim, Maria  and
      Beck, Daniel  and
      Mistica, Meladel",
    booktitle = "Proceedings of the 18th Annual Workshop of the Australasian Language Technology Association",
    month = dec,
    year = "2020",
    address = "Virtual Workshop",
    publisher = "Australasian Language Technology Association",
    url = "https://preview.aclanthology.org/ingest-emnlp/2020.alta-1.3/",
    pages = "21--31",
    abstract = "This study investigates the robustness and stability of a likelihood ratio{--}based (LR-based) forensic text comparison (FTC) system against the size of background population data. Focus is centred on a score-based approach for estimating authorship LRs. Each document is represented with a bag-of-words model, and the Cosine distance is used as the score-generating function. A set of population data that differed in the number of scores was synthesised 20 times using the Monte-Carol simulation technique. The FTC system{'}s performance with different population sizes was evaluated by a gradient metric of the log{--}LR cost (Cllr). The experimental results revealed two outcomes: 1) that the score-based approach is rather robust against a small population size{---}in that, with the scores obtained from the 40 60 authors in the database, the stability and the performance of the system become fairly comparable to the system with a maximum number of authors (720); and 2) that poor performance in terms of Cllr, which occurred because of limited background population data, is largely due to poor calibration. The results also indicated that the score-based approach is more robust against data scarcity than the feature-based approach; however, this finding obliges further study."
}Markdown (Informal)
[The Influence of Background Data Size on the Performance of a Score-based Likelihood Ratio System: A Case of Forensic Text Comparison](https://preview.aclanthology.org/ingest-emnlp/2020.alta-1.3/) (Ishihara, ALTA 2020)
ACL