Crispin Almodovar


Fixing paper assignments

  1. Please select all papers that belong to the same person.
  2. Indicate below which author they should be assigned to.
Provide a valid ORCID iD here. This will be used to match future papers to this author.
Provide the name of the school or the university where the author has received or will receive their highest degree (e.g., Ph.D. institution for researchers, or current affiliation for students). This will be used to form the new author page ID, if needed.

TODO: "submit" and "cancel" buttons here


2022

pdf bib
Can Language Models Help in System Security? Investigating Log Anomaly Detection using BERT
Crispin Almodovar | Fariza Sabrina | Sarvnaz Karimi | Salahuddin Azad
Proceedings of the 20th Annual Workshop of the Australasian Language Technology Association

The log files generated by networked computer systems contain valuable information that can be used to monitor system security and stability. Recently, techniques based on Deep Learning and Natural Language Processing have been proven effective in detecting anomalous activities from system logs. The current approaches, however, have limited practical application because they rely on log templates which cannot handle variability in log content, or they require supervised training to be effective. In this paper, a novel log anomaly detection approach named LogFiT is proposed. The LogFiT model inherits the linguistic “knowledge” encoded within a pretrained BERT-based language model and fine-tunes it towards learning the linguistic structure of system logs. The LogFiT model is trained in a self-supervised manner using normal log data only. Using masked token prediction and centroid distance minimisation as training objectives, the LogFiT model learns to recognise the linguistic patterns associated with the normal log data. During inference, a discriminator function uses the LogFiT model’s top-k token prediction accuracy and computed centroid distance to determine if the input is normal or anomaly. Experiments show that LogFiT’s F1 score and specificity exceeds that of baseline models on the HDFS dataset and comparable on the BGL dataset.