# Annotation Instructions

## Overview

This project entails writing (maximum 3-sentence) summaries based on MUC-4 templates. The aim is to write one summary per template that captures all of the information contained in that template. In other words, the summary should be written such that someone could easily reconstruct exactly the template on which it was based and no more than this.

At minimum, each template you see will have the following two slots:
- `Event Type`: the type of incident being described. Takes one of the following values: `attack`, `arson`, `bombing`, `forced work stoppage`, `kidnapping`, `robbery`.
- `Stage of Completion`: whether the incident has been `accomplished`, `attempted` (but not fully accomplished), or merely `threatened`. 

Additionally, most templates will contain a significant subset of the following slots:
- `Individual Perpetrators`: the individuals responsible for the incident (e.g. the kidnappers in a `kidnapping`).
- `Organizations Responsible`: the (e.g. terrorist) organization(s) responsible for the incident.
- `Victims`: those harmed, killed, or otherwise victimized by the incident.
- `Physical Targets`: buildings, vehicles, and infrastructure damaged or destroyed during the incident.
- `Weapons`: weapons used by the perpetrators during the incident.
- `Location`: the location where the incident took place.
- `Date`: the date the incident took place.

## Locations and Dates

For each of the above slots except `Location` and `Date`, the values will be a comma-separated list of strings, where each string is a representative mention of a distinct entity that satisfies the role associated with that slot. Your summary should include all such mentions **exactly as they are written**, modulo capitalization (i.e. you are encouraged to capitalize names and proper nouns in your summary).

Your summary should also include information about the location and date of the incident described in the template, **but only if it is explicitly mentioned in the text**. The difficulty here is that the official `Location` and `Date` slots often contain more information about the location and date of an incident than could actually be directly extracted from the text. However, for now, we are interested only in what can be directly extracted. As such, when doing the annotation, you will see two additional keys in the JSON entry for each example, `edited date` and `edited location`, meant to capture this information. In contrast to all of the above slots, the values for these slots will always be empty to start. You are to fill them in with any explicit mentions in the text that denote the date and location where the incident takes place, and you must include in your summary all the mentions that you add. A few notes on the annotations for dates:
- All explicit dates should be written exactly in the form in which they appear in the text, even if it is somewhat less natural than what you would normally write. This form is often `<day> <month>` or `<day> <month> <year>` (e.g. `14 February`, `2 April 1989`). A few notes on annotation for the `edited date` field:
- Deictic date expressions (e.g. `Today`, `Yesterday`, `Last Week`, `This Morning`) are acceptable as well.
- Even if more fine-grained temporal information is provided &mdash; e.g. the exact hour and minute of an attack &mdash; this should *not* be annotated.

A few notes on the `edited location` field:
- Pretty much any expression denoting where the attack occurred is fair game here &mdash; names of countries, towns, streets, etc.
  - A caveat: any location that appears as a value of the `Physical Targets` slot **should not** also appear as a value in `edited location`.
- Most often, these expressions will be a subset of the ones marked in the `Location` field.
  - Critically, however, not all of the expressions in the `Location` field will always be identifiable in the text. For instance, the `Location` field may have the value `colombia, antioquia (department)`, but it may be that in the document text, only `antioquia` is explicitly mentioned. In this case, only `Antioquia` should be included in the `edited location` field.
  - In the `Location` slot, you will often see place names that feature parenthetical information about the kind of entity they denote (e.g. `antioquia (department)` means that Antioquia is a department). When adding mentions to the `edited location` field, you should omit this parenthetical information (e.g. annotate only `Antioquia`).
  - Sometimes, contextual clues will suggest that an incident occurred in a particular location without explicitly saying so. For instance, maybe a "Salvadoran presidential candidate" is murdered, implying that this took place in El Salvador. Or perhaps you can infer in what country the incident took place (e.g. El Salvador) by virtue of a city or town (e.g. San Salvador) that *is* explicitly referenced and that you know to be situated in that country. In such cases, you should still only add the places that are *explicitly* mentioned (e.g. San Salvador).

In general, the `Date` and `Location` fields can be helpful to consult initially to determine *roughly* what you should be looking for in annotating the `edited date` and `edited location` fields, though it may often happen that you cannot find a mention in the text for at least one of `edited date` or `edited location`. In such cases, you should leave that field blank.

## Details

The files to be annotated are in JSON format. The top-level keys of each JSON file are document IDs. The value associated with each document is a list, where each element in the list corresponds to a distinct template and has the following fields:

- `instance_id`: A unique identifier for the current example. This has the form `<document ID>.<template ID>`.
- `document`: The complete (sentence-split) document text.
- `template`: The template for which the summary is to be written.
- `edited location`: See above.
- `edited date`: See above.
- `summary`: The (sentence-split) summary for the given template.

The `summary` field for each entry will be pre-populated with a candidate summary that was generated by ChatGPT. In the majority of cases, this will require at least some editing in order to guarantee that
1. all the mentions listed in the `template` actually appear in the summary
2. all the mentions you add to the `edited location` and `edited date` fields also appear in the summary
3. the summary does not contain information about some *other* template for the same document
4. the summary is not longer than 3 sentences.

Additionally, you may find that some of the summaries contain some fairly unnatural or stilted language (e.g. "the attack was carried out by individual perpetrators"); this should be corrected as well, in order to make the summary sound as natural and fluid as possible. However, if the candidate summary satisfies all the above constraints *and* sounds natural to you, feel free to leave it as is.

On point (3) above, you may include some additional information in the summary if it offers important context or otherwise helps clarify the situation being described to a reader. The key constraints the summary must satisfy are just that
1. one should not be able to conflate it with the summary for another template
2. upon reading it, one should be able to recover all and only the fillers in the target template (plus the values for the `edited location` and `edited date` fields).

Finally, a note on the event type. While the summary does not need to explicitly name the type of event being described (e.g. "attack", "kidnapping"), it should be clear to someone who is familiar with the MUC-4 event ontology which of the event types is being described.

You are strongly encouraged to look at some completed examples in either the dev or test splits, which can be found in (`data/muc/edited/`). You will also find the training data there that is to be annotated, under `data/muc/edited/train`.

Please feel free to reach out with any questions.