Social scientists rely on event data from news to quantitatively study the behavior of political actors. Event data are extracted manually or with the help of traditional natural language technology (NLP) technology. We strive to push forward the state of the art in automated event data extraction.
Our interest lies in the public protest domain. This includes events like strikes, demonstrations, riots, terrorist attacks, on-line campaigns, symbolic protest actions (e.g. shoe throwing). We want to learn about protest forms, actors, locations and times, issues and intensity, the evolution of protest stories in time. From a news document, like so:
Trade unions are satisfied with the course of today's blockade of the Czech-Slovak Drietoma-Stary Hrozenkov border crossing aimed to highlight bad social and economic conditions in Slovakia [...]
Czech News Agency, 2 March 2001
we would like to obtain structured information about a protest event:
action form | location | date | actor | issue | ... |
---|---|---|---|---|---|
blockade | Slovakia | 02.03.2001 | trade unions | welfare | ... |
Our goal is to construct a corpus of news documents together with reliable annotations of any protest events occurring in them. There are two kinds of annotation that we need:
Why do we need it? For one thing, this corpus could be used for the evaluation of automated event extraction systems. For another, it can be used for building supervised learning systems that automatically extract protest events. In this approach, one would use the corpus to estimate a general function from news documents to annotations, which would allow for predicting annotations for any unseen news documents.
Social scientists commonly annotate events at the document level: Event annotation (a structured representation of event data) is attached to the document. Linguists often use word-level (more technically, token-level) annotation in which meta-data are attached to words in the document. We shall refer to the latter kind of annotation using the linguistic term token-level annotation. We shall refer to document-level annotation as coding, to be consistent with the social science terminology.
In traditional event coding, the job of the coder is to detect and characterize events without indicating words that constitute textual evidence for such characterization, or relations between such words. In contrast, we would like you to mark the words that guide your coding of text.
Our task tests the limits of NLP technology. The challenge is that bits and pieces of information on a protest event are typically scattered about a news story. Automatically making sense of such chunks of information is far from trivial, which is why we want to be as explicit as possible about the annotation decisions that you as coder will have to make.
Consistency is the key.
In a team of coders, each person will inevitably interpret a piece of text differently from the others. This is fine. Not only is this fine, this is normal and is a fact of life. We address this by providing annotation instructions that should prepare you for a large part of your future annotation decisions. You might find many instructions too obvious to have to be spelled out; we simply want to take every precaution.
The more consistent the annotations are across coders, the less confused the future algorithm will be be. Also, the more certain (and happy) we are that all coders are doing the same thing.
An important thing to remember for the future is that you should not hesitate to contact your supervisor if you think any important annotation case is not adequately covered in the guidelines. In this way, you help us ensure consistency.
Explicit is better than implicit. (from the Zen of Python)
The algorithm that we will build to make sense of annotations can pretty much only count stuff or check if a thing is there or not. It is good at this but incapable of including understanding the subtleties of language and implicit information.
This is why, if the annotation decision that you are about to make involves guessing or any complex inference, we ask you to refrain from it. Whenever you are not certain or have to think too much, assume that the algorithm will surely fail here. We ask you to produce the bare minimum of annotations to make your point with most certainty.
Do not miss the forest for the trees.
Annotating at the word level can be confusing. Do not fall into the trap set by the news writer. It is their job to paint vivid images of events. If you read an account of three windows having been broken at a demonstration, it is likely that you need to annotate one event of political violence (not three) and one demonstration.
It might be helpful to think of word-level annotation as a process of locating solid textual evidence for protest events that you identify after reading the entire story.
Be exhaustive.
Everything that can be reliably annotated should be annotated.
We shall first introduce our understanding of a protest event. Then, we shall look at the interface that allows for the combined annotation of protest events at the document and word levels. Finally, we look into coding rules for protest events: their action forms, location, time, size, issues.
This outlines the requirements to protest events that we would like to have coded.
Protest events are politically motivated actions open to the public and not institutionalized, as opposed to e.g. elections. Concretely, a codable protest event should satisfy the following requirements:
A codable protest event falls into exactly one of the action form categories from this Table. The table contains all action form categories that we are interested in, along with examples of action forms.
1st-level category | 2nd-level category | Explanation | Examples |
---|---|---|---|
PROTEST | A residual category: action forms that do not fall under any other category | ||
Petition | Are protest actions related to the collection of individual support through signatures, letters, online campaigns. | petition, letter campaign, collecting signatures | |
Strike | Are actions that involve refusal to work. All forms of work stoppage pertain to the broad category of strikes. N.B. We count any instances of picketing by strikers as part of the larger strike event. | general strike, industrial strike, wildcat strike, work-to-rule strike, work stoppage, walking out of the job | |
Demonstration | Are contentious gatherings taking place in public spaces and making political claims. We also count as demonstrations all non-confrontational, symbolic actions that takes place at contentious gatherings. | demonstration, march, rally, vigil, protest meeting, protest gathering; and chanting slogans, carrying posters, waving flags, beating drums; and street theatre, human chain, mock funerals | |
Confrontational non-violent action | A residual category for confrontational non-violent protest actions | boycott, cyber-attack, refusal of payment, tax strike, rent strike, hacking, whistleblowing, disclosure of classified documents and information | |
Occupation or blockade | Are protest actions related to the occupation of public and private spaces as form of political resistance | Occupation of buildings, squat, protest camp, sit-in, siege, barricade, blockade of a road, chaining oneself to a tree, slowing down traffic | |
Symbolic violence | Symbolic violence against objects or persons | throwing of paint bombs, tomatoes, eggs, burning of books, flags; Nazi salute, painting of anti-semitic slogans, swastikas | |
Political violence | A residual category for violent protest actions | riots, clashes with police, burning tires, burning cars, breaking shop windows, throwing stones and Molotov cocktails, vandalism, arson, assault of persons, hijacking, kidnapping; and death threats, bomb threats | |
Self-inflicted violence | Are protest events that involve harm only to one's own physical integrity | Hunger strike, self-immolation. N.B. This does not include suicide bombings | |
Gruesome violence | Are premeditated actions that cause gruesome harm (maiming, death) to unprepared individuals | bombing, suicide bombing, assassination, acid throwing, knife attack |
Next to the action form, we consider only events that are politically motivated. Again, politically motivated is understood broadly as all kinds of activities aimed at some political outcome. This is far from a precise definition but it should exclude all forms of public gatherings or forms of violence that do not have any political meaning whatsoever. You can ask the following question: Did the actors have a grievance or claim about the desirability of change in society?
Protest-like actions by state officials that act in their official capacity should not be considered codable protest events. A common example would be an official (e.g. a president or mayor) who refuses to perform their duties (a president refuses to stand up or shake hands, a mayor refuses to marry a gay couple, etc.). This does not exclude state officials or political parties to be involved in public protest (e.g. by organizing a demonstration).
We do not require a minimum number of participants (e.g., one-person pickets and hunger strikes are valid codable events).
We only code those events that are reported as clearly having taken place or happening at the time of writing.
Please note that speech acts alone are not codable events (we demand, object, etc.) except bomb threats, death threats, and similar. That said, they can be part of a codable event, e.g. as demanding or chanting is part of a demonstration.
Please note that hooliganism and sport-related violence are not political violence and so do not count as protest events.
In general, forms of political violence are sometimes hard to distinguish from ordinary criminal acts. For example, ethnic and racial violence incidents are often not linked to explicit claims. Therefore, we propose looking at the target of the action to decide whether this qualifies as protest. The target of a protest action could be a member of a minority / ethnic / religious group; a public official or state representative; a member of a political party or civil society organization. In addition, it could be property or belongings of any of the above mentioned groups.
We take an electoral rally to be a major event of institutional politics and not a protest event.
We code only asserted past events and currently unfolding events. Codable events are concrete enough. We shall not code
For example, we shall not code:
Which events we code:
n
days ago and is set to continue into the future. For such unfolding events, the time will partially lie in the future with respect to the time of writing. For a strike that started Monday and will continue until Friday, we shall code the date range as Monday to Friday
.A protest action is coded as one separate event if
Below we provide sore specific explanations of these four key points.
If a protest event is recurring (e.g. every Sunday or the 31th of every month), each instance counts as a separate event.
Protest events can last over extended periods of time (e.g., an occupation of a public square or a blockade of a port). An interruption of more than 24 hours (e.g. the occupiers return after an eviction) marks the boundary between protest events: In this case, the interrupted action is coded as two separate events.
The finest level of location granularity that we are interested in is a human settlement. In practice, it means that a town, village, or city are valid locations whereas a specific street, park, square, or neighborhood is an over-specification.
However, some protest actions happen at locations other than settlements e.g. border crossings or roads (in a blockade or a terrorist attack). We shall consider such locations as equal to human settlements.
Often, protest events are organized jointly in multiple locations on the same date and issues (e.g. demonstrations in all big cities of one country). According to our rules, all of them should be coded as separate events if specific locations are mentioned. For example:
A few hundred people participated in similar demonstrations in Hamburg, Frankfurt, and Munich.
For this excerpt from a news document, we shall code 3 events taking place at 3 different locations.
A protest can be reported at a geographical unit larger than a human settlement (e.g. country or a region like Bavaria) and comprise events in multiple city-level locations, all unified by one theme or one set of issues, e.g. strikes continue to affect people on holidays in Greece. We shall code such a protest event only if the document does not mention all the events at city-level locations that are part of this event. For example:
Protests across the Netherlands […] Demonstrations took place in Amsterdam, Rotterdam and Den Haag.
For this excerpt, we should code 4 events -- one protest located in the Netherlands and three demonstrations in Amsterdam, Rotterdam, and Den Haag -- provided we are certain that the three demonstrations are not all protest events that are part of the protest across the Netherlands.
Depending on the action form category, a codable event could in principle comprise multiple smaller events/activities. A demonstration comes with any number of activities (marching in the streets, waving banners, cheering support to demonstration leaders, singing protest songs, etc.), and so does a riot (clashes with police, pelting stones at policemen / property, igniting fire bombs, overturning cars, burning flags, etc.). At the end, what we would like to code is simply one demonstration or one riot and not all its sub-events individually. If protest actions are performed by the same group of people (or a subset of them) with the same goal and all fall into the same action form category (see this Table), we do not code them as separate events. For example:
Riot police clashed with protesters who were burning tyres and breaking shop windows.
We code one event of political violence (riots).
However, if a sub-event falls into a different action form category, it should be coded as a separate event. For example:
A peaceful public demonstration in Zurich was followed by violent clashes with the police.
We code one demonstration and one event of political violence.
Sometimes, a demonstration and a counter-demonstration happen on the same date and at the same location. We code them as separate events: Although the issue is the same, the position or stance on the issue in such demonstrations is different (e.g. pro- and anti-abortion).
Event arguments are the participants and attributes of an event: Its actors, issues, location, time, and size.
We code only asserted arguments and not alleged, uncertain, or hypothetical ones. If a document only alleges that e.g. ETA is behind a bombing, we should not code ETA as the actor.
The coding of arguments is explained in what follows. The sections make reference to the interface for event coding that is introduced in part III.
A protest event is associated with actors -- individuals or organizations -- that organize the event, take part in or express support to it. We shall label actors using a number of attributes of interest. Selected labels should characterize well the properties of the core actors:
Below are the actor types that we would like you to annotate. The choice of actor types might look peculiar to you: Know that we have simply picked some actor types that we have seen annotated often enough in our previous annotation effort.
1st-level category | 2nd-level category | Explanation |
---|---|---|
Actor | The category shall be used only if the actor to the protest event cannot be identified by any of the properties below | |
Organization | Selecting this property indicates that organizations feature significantly among actors and they are neither political parties nor labor unions. A good indicator is when an organization is mentioned by name. | |
Political party | Selecting this property indicates that political parties or individual party members feature significantly among actors. | |
Trade union | This property indicates that unions or union members and representatives feature significantly among actors. N.B. If the document explicitly mentions a union and an occupational group as actors (e.g. miners' strike led by the XYZ union), both the labor union and occupational group labels should be selected. | |
Occupational group | Selecting this property indicates that a significant group of actors are people of the same profession or occupation, e.g. teachers, cleaners, mechanics, pensioners. | |
Selecting any of the properties below indicates the specific occupation of this significant group of actors. | ||
Farmers | Any individuals engaged in agriculture | |
Students | ||
Intellectuals | Artists, academics, experts, also trendsetters, celebrities -- individuals whose actions and beliefs have significant impact on society. | |
Below are some actor properties that group individuals based on views or background | ||
Extreme-right supporters | Selecting this property indicates that a significant part of actors can unambiguously be identified as holding extreme-right views, e.g. white power skinheads, neo-Nazis, neo-fascists | |
Left-libertarians | A significant part of actors can be identified as left-libertarians, e.g. anarchists, feminists, squatters, gay and lesbians, greens, anti-fascists, anti-war activists | |
Migrants / foreigners / refugees | A significant number of actors are migrants, foreigners, refugees, asylum seekers | |
Religious groups / fundamentalists | A significant group of actors can be identified as holding some specific religious beliefs N.B. We shall code Northern Irish protesters with this label if the document refers to them directly by confession names Catholics and Protestants. |
Below are some examples. Actors are shown in bold. Events are shown in square brackets.
Example | Labelling | Comment |
---|---|---|
the protest action by the Communication Workers' Union | trade union | |
protesters marched through the streets of Galway | ACTOR | |
[protest] organised by environmental activists | left-libertarians | |
1,000 pensioners to come to Riga for protest [rally] on Saturday | occupational group | |
anti-immigrant [demonstration] | ACTOR | We do not try to guess political views of actors |
In Ghent, 249 people are set to [undress] to symbolize the 249 days without government | ACTOR | |
Tensions in the city have spiralled since construction worker David Caldwell was killed in a dissident republican bomb [attack] last week. | ACTOR | |
A series of bomb [attacks] by ETA at tourist targets through the summer was "only a warm-up" | organization | |
During his visit, Manu Chao participated in a [protest] outside of the office of Maricopa County Sheriff Joe Arpaio... | intellectuals | |
A group of Spanish politicians, intellectuals, artists and environmentalists Thursday [called] for the abolition of bullfights, describing them as an obstacle to the defence of animal rights. | intellectuals, left-libertarians | |
More than 2,000 of Lithuanian residents including politicians and show business stars [demonstrated] their solidarity with Estonia ... | intellectuals | |
Earlier in the day, about 500 right-wing extremists had [gathered] to protest the high number of foreigners living in Greece. | extreme-right supporters |
As you can see pretty much all combinations of the actor labels are possible. A few important things to keep in mind:
Additionally to the annotation of protest size in the text, we would like you to interpret it as lying within some numerical range. We borrow the ranges and their rough characterization from the Dynamics of Collective Actions project:
Small number, few, handful | 1-9 people |
Group, committee | 10-49 people |
Large number, gathering | 50-99 people |
Hundreds, mass, mob | 100-999 people |
Thousands | 1,000-9,999 people |
Tens of thousands | 10,000 or more people |
We kindly ask you to code the country in which the protest is reported as taking or having taken place. The coding interface lets you choose among EU-27 plus Norway, Iceland, and Switzerland. Most protest events that you will encounter will be from these countries. We shall just as well code occasional protest events that are not from the countries on this list. In that case, we ask you to put the country name in the Comment
field of that event.
Annotating time expressions is brat
is not enough: We would like you to interpret this expression -- often with respect to the publication date -- and input in the calender form into the date range field.
Sometimes, a protest event is mentioned without a specific date (e.g. bombings in Mardid last year). In that case, you should indicate a date range that more or less corresponds to this unspecific description and signal that the range should not be understood literally (you will do this by checking the box with the approximately equal sign). In this way, we know that a date lies in the range that you have indicated but does not necessarily span the whole range.
Below you will find some rough guidance on translating approximate time expressions to some calender form:
in February, last February | the whole month (e.g. 01.02.2007 - 28.02.2007) |
in late June | the 21th on of that June (e.g. 21.06.2007 - 30.06.2007) |
in early June | the 1st through 10th on of that June (01.06.2007 - 10.06.2007) |
mid-June | the 11th through 20th on of that June (11.06.2007 - 20.06.2007) |
in 2007 | the entire year (01.01.2007 - 31.12.2007) |
early 2007 | January through February of 2007 (01.01.2007 - 28.02.2007) |
late 2007 | November through December of 2007 (01.11.2007 - 31.12.2007) |
mid-2007 | June through July of 2007 (01.06.2007 - 31.07.2007) |
earlier that week, at the beginning of the week | Monday-Tuesday of that week |
later that week, at the end of the week | Saturday-Sunday of that week |
Note that the interpretation of these expression may vary and it is up to you to provide a reasonable date range. That being said, do not think too hard about the best translation to the calender form -- the interpretation will inevitably vary somewhat from individual to individual.
A protest event is typically associated with claims or grievances which we refer to as issues. An event can have any number of issues, including zero. This table below introduces and explains the set of issues that we would like to have annotated.
The issues are organized hierarchically, and specific issues come with a position on the issue fixed. Given an event, you should first ask yourself if any of the specific - second-level - issues apply. This means that both the issue and the position on the issue match. If this is not the case, you should go for a less specific - first-level - issue. First-level issues do not differentiate the position. For example, if you come across an event that calls for more nuclear energy projects - something that using our vocabulary of issues could be dubbed for nuclear energy - you will have to label this issue with a first-level category, namely environmental protection: Although the issue of nuclear energy is on our list of issues, the position does not match.
1st-level category | 2nd-level category | Position | Explanation |
---|---|---|---|
Issue | A residual category: Issues that do not fall under any category below should simply be labelled as “issues” | ||
Social protection and rights | A residual category for the group of issues related to social protection and rights, to be used when no second-level category issue below is explicitly reported, including when the position on any second-level category does not match | ||
Social rights | For | In favor of the expansion of the welfare state and adoption of new social rights: calls for employment and care programs, coverage of new social risk. Against the retrenchment of the welfare state and cutting of existing social rights: against cuts in pensions, unemployment benefits, social aid, and other social security services | |
Labour rights | For | Issues related to labor market regulation: shorter working hours, higher minimum wage. For job protection, maintaining jobs in the local industries | |
Education | For | For a better educational system: more money for education, better quality of education | |
Economic protection and regulation | A residual category for the group of economic issues, to be used when the issue is related to the economy but does not fall under any of the specific categories below | ||
Budgetary rigor | Against | Against austerity measures, rigid budgetary policy, reduction of state deficit | |
Regulation of the economy | For | Support for state intervention into economy: maintaining the public sector, regulation of the financial/banking sector; in favor of regulation of international trade | |
Civil rights | A residual category for the group of issues related to civil rights | ||
Immigration | For | Pro migration, for the improvement of the situation of migrants, refugees, minorities in the country of residence, for multiculturalism | |
Against | Expression of racism, xenophobia, demands for restrictive immigration policies; against migrants, refugees, ethnic minorities | ||
Cultural liberalism | For | In favor of gender equality, homosexuals' rights, squatters, alternative lifestyles, abortion rights, the right to euthanasia, less traditional values and lifestyles | |
Against | Defending cultural conservatism and traditional values, against abortion rights, against gay marriage | ||
Institutional reforms | A residual category for all issues related to institutional reforms that are not covered by the specific issues listed below | ||
Regionalism | For | Support for separatism and more regional independence, such as Catalan, Basque, Scottish independence | |
Democracy | For | Calls for more or real democracy; institutional reforms such as separation of power, alternative electoral systems, strengthening of parliament/judicial bodies, more accountability, direct democracy. Also, the fighting of corruption and other measures of increasing the quality of government | |
Police power | Against | Opposition to police power, criticism of police violence and repression | |
Environmental protection | Issues related to environmental protection, nature conservation, climate change, animal rights, consumer protection such as controlling the production of GMOs, food labelling, biological agriculture. Also, issues related to infrastructure projects: construction or development of private transportation, airports, waste disposal, dams, etc. | ||
Nuclear energy | Against | Specifically, opposition to nuclear energy projects | |
International cooperation | A residual category of issues pertaining to international matters that are not covered by the specific issues below | ||
Peace | For | Calling for maintaining peace and against the military and military actions, for disarmament, dismantling of the army, reduction of spending for the army and defense | |
Human rights | For | Calling for sanctions against perpetrators, for protection of ethnic groups, economic, diplomatic, or military intervention for the sake of protecting human rights of third parties (Rwanda, Darfur, Iran, Taliban, ISIS, etc.) | |
Europe and European integration | For | Support for European integration, Euro/EMU and supranational management of the financial crisis | |
Against | Opposition to European integration, Euro/EMU, supranational management of the financial crisis | ||
Globalization | Against | Opposition to the globalization of corporate capitalism, criticism of G8, G20, WTO |
Below are a couple of examples. Note that typically a single sentence is not enough to make an adequate labelling decision and the context should be taken into account. The issue expression that we will annotate in brat
is in bold, the event anchor is in square brackets.
Example sentence | Labelling | Comment |
---|---|---|
Some 2000 protesters had [gathered] to demand "dignified living conditions" for migrants in the French port city of Calais. | For migrants, refugees, minorities' rights | |
Protesters opposing same-sex marriage have gathered in Sydney for a [protest], where they [clashed] with a small group of counter-protesters. | Against cultural liberalism | |
Conservation groups have united in [protest] against the planned new road. | Environmental protection | |
Foreigners [attacked] in retaliation for Cologne New Year's Eve assaults | Against migrants, refugees, minorities' rights | |
On the village side of the gate, a mob was [picketing] and [chanting], Ireland for the Irish and Brits go Home. | For regionalism | assuming this takes place in NI |
Anti-NATO protesters [rallied] in the Serbian city of Nis on Friday. | For peace | |
Thousands [protested] the election fraud | For democracy | |
Doctors and patients [protested] against plans to cut services at the hospital | For labour rights | |
The protesters [demanded] that the two officers involved in the shooting be criminally charged. | Against police power |
We shall not annotate more than two most central issues per protest event. It is conceivable that at a large demonstration, many kinds of grievances are voiced. However, we expect that a small set of core issues unite the actors and these are the issues that we want to have coded.
Further comments on coding issues:
In this part, we shall cover the rules for annotating text using the brat annotation interface. Some of the material below comes from the ACE guidelines for events and entities, including unattributed quotes.
An event mention is a mention of some event in some specific point in a text.
Important: In line with the exhaustivity principle, we should annotate all the mentions of all codable events in a given text, even if an event mention does not provide new information or is not the focus of the sentence that the mention occurs in.
There are two spans of text that we need to care about when annotating the text in brat
: The event sentence and the event anchor.
An event anchor is the word that most clearly expresses the occurrence of an event. Identifying event mentions is equivalent to correctly identifying event anchor words.
The specific rules for identifying event anchors are described below.
An event sentence is a sentence that contains an event anchor.
Expressions that denote event arguments are called event argument mentions. We will often be sloppy about the terminological distinction between arguments and argument mentions since it is often clear from the context which one is meant.
Below are examples of sentences that describe protest events. Event anchors are marked in bold:
- After wildcat strikes and protests by unions, the forest industry imposed a two-week lockout.
- Thousands of people rioted in Port-au-Prince, Haiti over the weekend.
- The union began its strike on Monday.
- Protesters rallied on the White House lawn.
- The rioting crowd broke windows and overturned cars.
- A crowd of 1 million demonstrated Saturday in the capital, San'a, protesting against Israel, the United States and Arab leaders regarded as too soft on Israel.
- For weeks Italian Jewish groups, World War II veterans and leftist political parties have staged protests against a meeting between the pope and Haider, arguing that a papal encounter would lend the Austrian politician legitimacy.
- More than 40,000 workers were back at their jobs Thursday following a 1- day walkout that closed social welfare offices and crippled public medical services.
- During the work stoppage Wednesday, local residents were unable to register marriages or get documents for real estate transactions.
- A car bomb exploded Thursday in a crowded outdoor market in the heart of Jerusalem, killing at least two people, police said.
- Men in civilian clothes in the crowd began firing with AK-47 assault rifles and a 45-minute gun battle broke out.
- A number of demonstrators threw stones and empty bottles at Israeli soldiers positioned near a Jewish holy site at the town's entrance.
- Around 500 people took to the town's streets chanting slogans denouncing the summit.
We do not annotate Event sentences in any way. Note that the annotation interface identifies sentences automatically: They are numbered and appear as separate paragraphs.
The important point about event sentences is this: We only annotate those event arguments that occur in event sentence, i.e. the same sentence as event anchors.
The following subsections describe the process for identifying the anchors of events.
There are two core rules for identifying anchors:
Be sure to understand them well: In the vast majority of cases, either one or the other rule apply.
All other rules address special cases.
Recall that an event's anchor is the word that most clearly expresses its occurrence.
A word that most clearly points to a protest event is a noun that denotes a protest action. Consider the examples below. Event anchors are marked in bold:
- The strike is the third in Spain since mid-October after work stoppages by road transport employees.
- Protests are staged more frequently with homophobic incidents occurring, such as Wednesday's attack on a gay nightclub in Lille.
- Italian state prosecutors on Monday dropped an official judicial inquiry against a police officer who killed a protester in riots during last year's Group of Eight summit in Genoa.
- ETA has been relatively quiet this year since the March 11 train bombings in Madrid that were initially blamed on the Basque separatists.
- A widely followed hunger strike was staged at the prison last month as part of an organised nationwide protest against overcrowding and poor conditions.
- Demonstrators were refusing to leave the scene and were staging a sit-in.
- Clashes erupt outside Athens prison
- The attack killed 7 and injured 20.
- The explosion claimed at least 30 lives.
Things get a bit complicated when we have to tackle expressions like strike action. One could argue that the noun that denotes the event is action whereas strike specifies a certain sub-type of action. Linguists would say that the noun strike modifies (or is a modifier of) the noun action.
We do not want you to construct arguments like that in your mind every time you see some complex expression that denotes a protest event. Our goal is to simply have some one agreed way of annotating event anchors.
This is why, as a rule of thumb, we will say that the noun we pick should not be modifying another noun.
By far the most frequent case when this should ever be an issue are expressions like strike action, protest march, protest demonstration, etc. In accordance with the rule of thumb (and in agreement with the argument), we will annotate as anchor the modified noun and not the modifying one:
11. Air traffic controllers threaten strike action
12. Protest action against outsourcing started at the University yesterday
13. A protest march against Islamophobia was held at Dundonald Park
14. A protest demonstration was held here on Wednesday against the visit of Egypt's president Abdel Fattah el-Sisi.
By the same token:
16. "We don't know who did it but ... we're satisfied this was clearly an act of terrorism," he said on CBS.
17. 2015 in particular has seen a new wave of right-wing violence, mainly against refugees.
In the absence of a noun denoting the protest Event, one often finds that the main verb most directly describes the event. The following examples mark in bold those anchors that are main verbs:
- Last year, the education unions protested against the draft budget for 2008 for many months
- Protesters demonstrated against Uganda's anti-gay proposals in London during a visit by the Ugandan prime minister.
- Several unions were calling on workers in dozens of the capital's state-funded museums, theatres and cultural centres to strike from next Wednesday over the job cuts.
- Hundreds of youths clashed with police in Manchester.
- A crowd of 40,000 supporters of the Democratic Opposition of Serbia gathered at the bridge, and were met by 100 policemen.
- Eight union representatives occupied the offices of the industry ministry in Madrid.
- Hundreds of thousands of Spaniards took to the streets for the second week in succession Saturday.
Not surprisingly, most of interesting verbs are related to nouns denoting protest Events: demonstrate, protest, strike, block, attack, clash, boycott, picket, occupy, etc.
In all examples above the main verb is simple consisting of just one word. Sometimes, the main verb is complex:
have blocked off, will protest, did not let in, are demonstrating, etc.
In such cases, we annotate only the notional verb but not auxiliary verbs (have, did, is going to, will, etc.), verb particles (off, in, etc.), or negation (not):
have blocked off, will protest, did not let in, are demonstrating, etc.
Below are some real-life examples. The notional verb is in bold, the auxiliary verb and verbal particle are in italics:
8. Prescott punched a demonstrator who had thrown an egg at him at a campaign rally in north Wales.
9. The enraged taxi drivers then blocked off parliament for six hours.
In this example, the main verbs are in passive. We handle this in exact same way -- by marking the notional verbs only:
10. A journalist was attacked and highways blocked in several locations throughout Italy.
The condition that there is no noun denoting a protest Event should not put you off. The intuition is that if such a noun is present, no main verb can better indicate the occurrence of a protest event. The examples from Section 2.3.1. illustrate this point.
Contrast and internalize the following:
Protesters staged a sit-in vs Protesters pelted bricks at the police
Some more examples with verbs stage, organize, hold or similar:
11. Armenian youth of Moscow to organize picket at Turkish embassy on first anniversary of Hrant Dink's murder
12. KKK will hold rally at South Carolina State House to protest removal of Confederate flag
13. Drivers of Austria's "Postbus" service held protest meetings on Thursday against feared staff cutbacks.
In the last example, observe also that we pick meetings over protest.
So far so good with the core rules. We will now deal with rules that cover some special cases that we would like you to pay attention to.
In rare cases when no core rule applies, you can take a noun modifying another noun as the anchor, provided the noun does indeed denote the protest Event.
London anti-war demonstration participants speak out against war
In rare cases when no core rule applies, the event anchor can be a participle that modifies an actor noun (rioting, striking, protesting):
- Shots fired against striking miners in Poland.
- The company manager addressed the protesting crowd.
In the following example the participle itself comes with modifying words:
3. The crowd, chanting anti-government slogans, was met by a heavy deployment of riot police.
Whenever both a patriciple and a main verb occur in the same sentence and then denote pretty much the same event, the main verb should be annotated since the core rule has precedence over any special rule:
4. The rioting crowd broke windows and overturned cars.
In case a particle and a main verb denote different events, then we should annotate both of them.
With just these 4 rules, you should already be able to correctly identify most event anchors we need.
To be absolutely sure that we are all on the same page, we add a few more sections that discuss difficult, borderline cases and provide guidance for situations that look like there are multiple anchors possible.
There are some nouns that refer to Event participants and simultaneously imply the occurrence of an Event, such as demonstrator, protester, attacker. These should never annotated as Event anchors for two reasons:
There are cases where several verbs are used together to express an event:
- Men in civilian clothes in the crowd began firing with AK-47 assault rifles.
- The miners continued to strike for a further six months and were eventually forced to accept longer hours, lower pay and local agreements.
- Incensed taxi drivers Friday tried to storm the Bulgarian parliament.
- On July 27, 1987 they attempted to bomb the Family Planning Associates Medical Group with six other members of the church.
Verbs begin, continue, stop, carry on, try, attempt, etc. present some aspect of the event (e.g. the start of the event), and the event itself is expressed by the verb that comes after.
When protest event anchors come listed, we shall annotate every item of the list, even if some of the anchors denote the same event:
- A number of demonstrations and actions today are taking place in memory of the murder of 15-year old Alexis Grigoropoulos by a police officer in Athens.
- There are number of marches and rallies taking place on the 30th November in the North West region.
- rioting, looting and arson attacks
- petrol bombs were thrown, business premises attacked and cars torched
- demonstrators scuffle with, throw Molotov cocktails at athens police
Generally, we do not consider speech acts as protest. In those cases where no better anchor is available, speech act words can be annotated as event anchors:
1. Hundreds of protesters demanded the president's resignation.
In the context of public protest, common speech act verbs are demand, press for, call on, call for.
We shall annotate death / bomb threats:
2. A far-right Austrian politician has received death threats from an Islamic group after she made several inflammatory remarks about Islam
Related to the previous rule, in phrases and sentences like
- staged a blockade in protest against ...
- gathered to protest against ...
- demonstrated protesting against ...
- Students in downtown Riga grill potatoes in protest against reduced higher education funding.
the word protest introduces the protest issue and not the protest action. Its meaning is close to that of a speech act word. To see this, compare e.g. in protest against Iraq war and demanding end to Iraq war.
In such cases, typically the sentence contains a better descriptor of the protest event, which we shall take as the event anchor. This is however not the case in the following example:
5. Students in protest over appointment row
In agreement with the noun rule, we annotate the plural noun denoting protest events as anchor and not counting words that come with it:
- A number of demonstrations today are taking place in memory of the murder of 15-year old Alexis Grigoropoulos by a police officer in Athens
- On the evening of 13 November 2015, a series of coordinated attacks occurred in Paris and its northern suburb, Saint-Denis.
- Deadly blast devastates Turkish police HQ in Kurdish region, string of attacks reported
Some counting words that you should watch out for are a number of, a string of, a series of, a spate of.
Structurally, a series of attacks is close to an act of terrorism or a wave of violence. The difference that a series of and similar words act as counting words for countable nouns and could easily be replaced with cardinal numerals (2, 3, 4, ...) e.g. seven attacks.
If a protest event and an event resulting from it occur together, the resultant event is never a valid event anchor:
- received death threats
- A man was wounded in stabbing
- shot in a paramilitary-style attack
Clearly, neither receiving, nor wounded are protest events. Threats and stabbing are. Some more examples:
4. killed when a car bomb exploded
5. The enraged taxi drivers blocked off parliament, causing enormous traffic jams throughout the city center.
Neither killed nor causing denote protest events, hence they cannot be event anchors.
Unfortunately, not all protest actions are named directly. Some types of protest events are strongly associated with objects that are used to indirectly refer to the event. For example, the word letter often stands for letter protest or bomb for bombing:
- "Meanwhile about 30 funerals were scheduled for Friday, two weeks after the attacks where eight people died in a car bomb in Oslo and 69 were killed in a shooting at a youth camp organized by the Labour Party on the island of Utoya."
Contrary to the rule that says we should annotate nouns and verbs denoting protest actions, sometimes you might need to annotate the object if no better choice is available:
2. Police intercepted the bomb in Londonderry after noticing a vehicle acting suspiciously on the Foyle Bridge early this morning. 3. Republicans have also been concerned about loyalist paramilitary activity, with a Sinn Fein member, Michael Agnew, discovering a pipe bomb under his van yesterday outside his home in Ballymena, Co..
Normally, you should however find an anchor in the form of a verb that expresses the protest act:
4. She was also suspected of having sent mail bombs to a number of journalists, he added.
5. "The Jeep got within about seven metres (yards) of passengers queuing in the terminal building before his passenger threw two petrol bombs, while Kafeel Ahmed also appeared to throw a bomb, Laidlaw said."
6. The unidentified culprits lobbed four fire bombs at police stationed outside the party's offices at 0140 [2340 gmt], which ignited in the street outside without causing damage or injuries.
7. A bomb exploded Wednesday at a deserted shopping center in Vitoria -- administrative capital of Spanish Basque country -- ripping off part of the roof but causing no injuries.
The same applies to words letter, manifesto, poster:
8. The letter is expected to be published in Monday's issue of the British daily The Financial Times.
9. Klaus reacted to an open letter signed by the current and and former chairmen of the Students' Chamber of the Masaryk University Academic Senate, in which students expressed disagreement with Klaus speaking on the university ground.
Also:
10. sign a protest letter
This case is similar to stage a demonstration (signing the letter equates to the participation in protest), which is why letter should be taken.
Titles provide useful summaries of events. We ask you to carefully annotate titles, especially in cases where you find metaphoric references to the protest covered in the rest of the article:
Fury in Bulgaria after kidnapped child found dead in lake
Incensed taxi drivers Friday tried to storm the Bulgarian parliament and ...
This section discusses the annotation of event argument mentions in brat
.
Recall that we annotate only those arguments (actors, protest size, time, etc.) that occur in the same sentence as the event that they are associated with.
Important: The following scenario is not uncommon: You have coded some information about an event at the document level (e.g. an issue), however this information is only expressed in a sentence where no anchor of this event occurs. In this case, we simply do not annotate this information in brat
.
In order to indicate in brat
that an argument belongs to some event, you should draw an arrow from the event to the argument. A dialogue window pops up, asking you to confirm the relation type. To close the dialogue window, hit Enter
.
If an argument clearly belongs to an event, it must be connected to the event. However, if more than one event anchor of the same event occurs in the same sentence, it is sufficient to connect each argument only once to any of the anchors.
An event anchor may be connected to any number of arguments of one type, including none.
In the case where an entity is clearly an argument to one event in the sentence, but also applies quite reasonably to another event in this sentence, it should be annotated as an argument of both events.
Argument mentions, with the exception of issues, cannot overlap with other argument mentions or event mentions. This also rules out the overlapping of mentions of one kind. However, issues may contain or overlap with any other argument or event: Unlike other arguments, issues are typically expressed with multi-word phrases or clauses.
In brat
, move your cursor to the top and click on Option
on the right-hand side. A dialogue window pops up that -- among other things -- allows you to set the annotation mode. Set it to Normal
: This will speed up the annotation interface when using hot keys.
In order to label a span of text without going into the dialogue window, select the span, release the mouse / touchpad, and immediately hit one of the hot keys. The following keys are defined:
All argument mentions are continuous strings of words (sometimes also punctuation marks). Although brat
annotation tool allows for annotating discontinuous spans, we shall not use this function.
Even when we annotate phrases, we shall not annotate articles unless they are caught up inside an annotation. See also another exception when we annotate clauses as issues.
We shall include into an argument mention all punctuation marks that occur inside the mention, however not the punctuation mark that follows the mention, e.g.:
February 22, 2012
The first industrial [action] by workers in defence of a final salary pension scheme was today being launched by steelworkers .
In the first example, the comma is part of the time mention. In the last example, notice that the second actor mention does not include the final full stop.
In the similar manner, we will not annotate the possessive 's and ', e.g.:
The trade union's plans are ...
Recall that actors are individuals or organisations that perform a protest action.
We are primarily interested in two types of Actors: collective actors and organization actors.
Collective actors are often expressed by descriptions like workers, farmers, extremists, terrorists, activists, however the descriptions are not necessarily always plural (since we allow for protest events with one person as the sole actor too).
An organization actor is expressed by the name of an organisation, typically a trade union, party, or protest group.
Occasionally, we find named individuals as protest actors (e.g. in musician Manu Chao joined the demonstration). We shall think of named individuals as collective actors.
We shall annotate all actor mentions, be it very specific (e.g. the protest action by the Communication Workers' Union) or least informative (e.g. protesters marched through the streets of Galway).
Typically, the actor is expressed by
In the examples above, event anchors are in square brackets e.g. [protest].
We shall annotate organization actors and collective actors differently. For collective actors, we will simply annotate the noun which describes the occupation, profession, or affiliation with some group. For organization actors, we will annotate the full name of the organization.
Here are some examples of annotated collective actors. Event anchors are again in square brackets:
About 200 protesters [scuffled] and threw Molotov cocktails at police in downtown Athens Saturday night when the authorities blocked them from reaching an anti-immigrant [demonstration].
1,000 pensioners to come to Riga for protest [rally] on Saturday.
In Ghent, 249 people are set to [undress] to symbolize the 249 days without government.
We shall not annotate representives or chairpersons of protesting organizations as actors. Istead, we shall annotate organization names. For example, we shall annotate Stop Temelin as an actor and not initiators:
Its initiators from Stop Temelin, supported by other opponents of nuclear energy from Upper and Lower Austria, told CTK that the [action] will probably last "a few hours."
In rare cases, a collective actor is an adjectival modifier:
Tensions in the city have spiralled since construction worker David Caldwell was killed in a dissident republican bomb [attack] last week.
Typically, the name of an organization, party, or protest group is written with capitalized words (ETA, Conservatives, Teachers' Union).
We shall not annotate:
Here are some examples:
The executive of the Fire Brigades Union was meeting today to discuss tactics in the long running dispute as a 24-hour [strike] by firefighters looked certain to go ahead tomorrow because of continued deadlock over their pay claim.
... CWU's spokesman XYZ was quoted as saying.
If the name of an organization is not mentioned, we shall annotate the actor's noun -- just like we would annotate a collective actor:
Nevertheless, the trade unions are determined to keep up [protests] against the draft national budget for 2011 ...
Names of individuals are odd ones. We shall annotate them as follows:
So, the idea is to annotate the name like a collective actor if possible; if not, we have to annotate it like an organization actor.
Here are some examples:
During his visit, musician Manu Chao participated in a [protest] outside of the office of Maricopa County Sheriff Joe Arpaio...
The participants in the [demonstration] convoked by musician Eduard Bartusek and journalist Tomas Kabrt appealed on ...
However
During his visit, Manu Chao participated in a [protest] outside of the office of Maricopa County Sheriff Joe Arpaio...
If Actors are listed, we shall annotate each of them as a separate actor.
The National Union of Journalists and Unite ...
More than 2,000 of Lithuanian residents including politicians and show business stars [demonstrated] their solidarity with Estonia ...
A group of Spanish politicians, intellectuals, artists and environmentalists Thursday [called] for the abolition of bullfights, describing them as an obstacle to the defence of animal rights.
Also, sticking to the rule all the way:
During his visit, musician and activist Manu Chao participated in a [protest] outside of the office of Maricopa County Sheriff Joe Arpaio...
We shall not annotate pronouns that express actors, even though this loses us information.
He then joined the attackers of the Woodstock bar, [throwing] rocks the size of a fist into the bar's windows.
Most of the [protests] were peaceful but [clashes] erupted in Hamburg, where police used water canon to disperse about 2,000 demonstrators, some of whom [threw] bottles and stones at the officers.
he and some should not be annotated.
We shall also annotate expressions that refer to the number of people participating in protest events. Recall that we are interested in protest events of any size including one-person protests. Here are some examples of protest size expressions:
Earlier in the day, about 500 right-wing extremists had [gathered] to protest the high number of foreigners living in Greece.
1,000 pensioners to come to Riga for protest [rally] on Saturday.
In Ghent, 249 people are set to [undress] to symbolize the 249 days without government.
About 250 people [gathered] near St. Wenceslas statue in Prague's Wenceslas Square to express their disagreement with Gross 's behaviour in the explanation of the unclear finances of his family.
More than 2,000 of Lithuanian residents including politicians and show business stars [demonstrated] their solidarity with Estonia ...
We only annotate expressions that directly refer to the number of people participating in a protest event, and never the number of organization actors. For example, in the following no protest size expression should be annotated:
The 28-strong coalition of groups representing farmers, consumers, scientists, environmentalists, aid volunteers and politicians ...
We shall annotate both maximally specific protest size expressions (e.g. 249 people) and very loose quantifiers of protest size (e.g. few people joined the protest).
Typically, there are two types of protest size expressions that we encounter:
Most often, you will see expressions of the first kind. However, because we occasionally have expressions of the second kind as well, protest size is an argument of events and not actors.
We shall try to capture all information about protest size including all words expressing uncertainty about the exact number of participants like around, not more than. More specifically, we shall annotate:
We shall also include in the annotation
5. Expressions that modify words from 1, 2, and 3 e.g. only, not more than, about, just, some, as many as.
Combining 5 with rules above, we get the following protest size annotations:
about 300 protesters
not more than 50 people
not so many protesters
50,000 to 100,000 protesters
only a small group of demonstrators
If a noun phrase expresses protest size (point 3), we shall not annotate
Thus, we have
a number of intellectuals
where a and of are not part of the annotation.
Note however the article and preposition of caught up inside of the annotation:
only a small group of demonstrators
tens of thousands of protesters
This is because discountinous annotations are not allowed.
We shall also annotate all modifiers of the noun (great, small), e.g.
a great number of intellectuals
a small group of demonstrators
Altough an indefinite article indicates that there is a single actor, we shall not annote it, to be consistent with the general rule about articles, unless the article is caught inside an annotation.
We shall annotate expressions that indicate the geographical location of a protest event.
In principle, we are interested in locations of the level of human settlements (city, town, village, occasionally -- road, border crossing). If a sentence does not contain this kind of location information, we shall annotate a lower-level location description (district, street, square). If this information is also absent, then we shall take any higher-level location description (region, country). Schematically, this could be represented as follows:
city/town/village > district/street > country/region
Here are some examples. Expressions that should not be annotated are in blue font:
German police said 50,000 to 100,000 protesters [gathered] near Heiligendamm.
According to the explanations given by the Basque regional interior minister, Javier Balza, about the bomb [attack] which occurred this morning in Portugalete ...
About 250 people [gathered] near St. Wenceslas statue in Prague Wenceslas Square to express their disagreement with Gross's behaviour in the explanation of the unclear finances of his family.
As the sentences mention cities as the locations of the events (Heiligendamm, Portugalete, Prague), we shall annotate the city names and neither the higher-level location descriptions (German, Basque) nor the lower-level location description (Wenceslas Square or even St. Wenceslas statue).
Most commonly, Locations are of two kinds:
The main idea is that we shall annotate capitalized names without any modifiers. In the cases where sentences do not contain placenames, we shall annotate nouns and adjectives like country, capital, nationwide -- also without modifiers.
We shall annotate names of locations that are written in capitalized words. We do not annotate:
Sometimes, a placename is written together with higher-level location specifiers e.g. Garumna, Ceantar na nOileán, Co. Galway. In such cases, we shall annotate only the placename itself and not
5. any of the following specifiers (e.g. Garumna, Ceantar na nOileán, Co. Galway).
Here are some examples:
[Riots] started in the northern London district of Tottenham.
About 200 protesters [scuffled] and [threw] Molotov cocktails at police in downtown Athens Saturday night when the authorities blocked them from reaching an anti-immigrant [demonstration].
Workers and members of the public joined the [demonstration] in Newport, south Wales, mounted by the Communication Workers' Union (CWU) as part of plans to fight the job cuts at the Solectron factory at Cwmcarn.
We shall annotate location-related adjectives and nouns which start with a capital letter. We shall not annotate any modifiers:
A group of Spanish politicians, intellectuals, artists and environmentalists Thursday [called] for the abolition of bullfights ...
Occasionally, we shall annotate common nouns and adjectives. This rule applies only if no placename or location-describing adjective is available. Further, the rule is restricted to expressions denoting a town/city/village or country/region (i.e. no streets, squares, districts, etc.).
We shall annotate expressions like the country, nationwide, the capital only when they could be relatively easily linked to the actual country / town name given the dateline (e.g. LONDON (AP) June 2) or the neighboring sentences. In such cases, we shall annotate only the noun or adjective without any modifiers. Here are some examples:
This was the country's largest demonstration since the anti-nuclear campaigns of the 1980s.
The government is facing nationwide [protests] and [demonstrations].
Similarly, we shall also annotate general in general strike because it indicates that the protest is nationwide.
Meanwhile, unions have threatened a general [strike] on the 31st of May.
In the following example, we annotate German as rule 6.5.2. takes precedence:
Demonstrations are held in many German towns.
Below, we annotate country and not towns since this sentence provides no information about specific settlements:
Demonstrations are held in many towns throughout the country.
In a list of locations, we shall annotate each location separately, e.g.:
Organizers of the demonstrations in Berlin, Stuttgart, Cologne and Dresden said they were rallying against racism and xenophobia.
We shall annotate expressions that describe the time of a protest event, typically its date. Here are some examples:
Kent police has released extra travel advice ahead of today's [protests].
About 200 protesters [scuffled] and [threw] Molotov cocktails at police in downtown Athens Saturday night when the authorities blocked them from reaching an anti-immigrant [demonstration].
Nevertheless, the trade unions are determined to keep up [protests] against the draft national budget for 2011 and will organize another [demonstration] in early December.
The meaning of a time expression usually depends on the context, and very specific dates like 22 February 2003 are rare. We commonly find expressions like 22 February, today, Friday, or this morning. All these expressions can be disambiguated well if one knows the publication date of the document.
We shall only annotate time expressions that help identify the calender date or dates of a protest event given the knowledge of the publication date.
We shall not annotate time expressions that indicate the time of an event relative to some other event like in the examples below:
He was [attacked] by protesters several minutes after his speech.
More [protests] took place after the elections.
Whereas time expressions like these provide enough information for a human to localize an event in time, for a machine, this would involve an additional inference step (finding out the time of the reference event) which makes things hopelessly complicated. In such cases, we shall simply assume that no time expressions are available.
We encounter two types of time exressions:
We shall annotate all words that describe time except articles and prepositions. Concretely:
Recall that it is a general rule that we do not annotate articles.
We shall not annotate any prepositions unless they are caught up inside an annotation (see 7.3.3.). This parallels the annotation of locations that also excludes prepositions (e.g. demonstrations in Prague).
Here are some more examples:
It is believed that the protesters who [clashed] with police had taken part in a peaceful [demonstration] organised by the group No Borders to coincide with similar [protests] held in Paris earlier today.
Thousands march in anti-war Easter protests in Germany
As with other argument mentions, we shall annotate each time expression of a list separately, e.g.:
Today and yesterday's [demonstrations] attracted large numbers of supporters of right-wing parties.
We shall also annotate duration expressions whenever they help identify the date of an event:
Shannon Mann [...] organized a [protest] from February 16 to February 28 near one of the capture pens to observe the removal of the horses.
Observe again that we do not annotate from but we do annotate to inside of the annotation.
We shall not annotate expressions like one-hour as in one-hour demonstration.
We call the topics addressed in a protest action issues. Issues are difficult to annotate on the level of words as there are many ways to express them or they are inferred from the described situation rather than explicitly stated.
A prototypical example of an issue is the subject of a demand, claim, grievance expressed by protesters.
A concept closely related to the issue is the trigger event -- the event that triggers a protest action.
The trigger event often helps infer the issue, together with the descriptions of the protest Event and other arguments. Because of this, we shall annotate trigger events as issues.
Here we shall discuss some most general types of issue expressions. It is possible that an issue expression can be attributed to more than one type, e.g. a keyword that functions like an issue phrase.
In a newswire text, the issue is commonly expressed by the object of verbs like demand, oppose, protest (about/against/for), demonstrate against/for, campaign against/for, urge, say, symbolize, show, call on, draw attention to in the case where the subject is a protest actor.
Some 2000 protesters had [gathered] to demand "dignified living conditions" for migrants in the French port city of Calais.
Earlier this year, Jongeward [went] around and [gathered] signatures demanding that Gillingham abandon the mine.
Protesters opposing same-sex marriage have gathered in Sydney for a [protest], where they [clashed] with a small group of counter-protesters.
The protesters [expressed] their support for Palestinian prisoners and [called] for their release from Israeli jails.
[Chanting] slogans at the Skanderbeg Square and [waving] large Albanian flags, the protesters said the government is violating the constitution in reaching deals with Serbia and Montenegro.
The object can be a noun phrase like in examples 1, 3 and 4 or a clause like in examples 2 and 5.
(Recall that a noun phrase is made of a noun and the words that modify it. A clause is made of a verb and typically its subject and includes all the words that modify them.)
Similarly, the issue can be expressed by the object of nouns related to verbs above like protest, demonstration, campaign etc.
6. Conservation groups have united in [protest] against the planned new road.
7. Foreigners [attacked] in retaliation for Cologne New Year's Eve assaults
8. Police officers detain an activist who was taking part in a [demonstration] against migrants, [...]
Sometimes, the issue can be inferred from or is the event that triggers protest. In our previous annotation work, we have found that annotators tend to associate descriptions of such trigger events with issues. Here are some examples:
9. Hong Kong [protests] turn into [riots] after police evict illegal food stalls in Mong Kok district
10. Hospitals in south-west London [...] prepare for junior doctor [strike] in case government talks fail
Often, the trigger event is expressed with a clause introduced by conjuctions like after, if, because, in case, unless. In such cases, the whole clause that starts after the conjunction shall be annotated as in the examples above.
In the case of event phrases, temporal conjunctions / prepositions like after additionally carry a strong causal meaning.
Sometimes, issues are expressed in direct quotes attributed to protesters like in example 11 or the second annotation of example 12.
11. On the village side of the gate, a mob was [picketing] and [chanting], Ireland for the Irish and Brits go Home.
12. Pro-Putin demonstrators in Moscow [hold] posters reading "Crimea is Russian land!"
Quotation marks signal the quote and should also be annotated.
The keyword or key phrase most directly and concisely describes the issue and is often attached to the Event anchor or protest Actor, e.g. mass immigration, gay marriage, Pro-Russian, anti-gay:
13. Anti-NATO protesters [rallied] in the Serbian city of Nis on Friday.
14. 10,000 expected at anti-immigration [rally] in Dresden.
Occasionally, a keyword occurs on the target of a protest action, e.g.:
15. Protesters [attacked] Russian embassy
To enforce consistency of annotations, we shall annotate issue expressions in this priority if multiple construction types apply:
issue phrase > event phrase > quote > key phrase
Whenever the issue is expressed by a higher priority construction type, the annotation rules for this type should be applied even if one can identify a construction of a lower priority type.
In practice, this would mean that e.g. if the annotation rule for the issue phrase tells you to annotate a full phrase, you cannot just annotate one keyword:
16. Protestors said they are [demanding] changes in U.S. immigration policy for Haitians.
Yet, we would annotate immigration when it occurs as a construction of the keyword / key phrase type, e.g.:
17. Arizona Immigration [Protests] Draw Hundreds
Issue phrases, event phrases, and quotes are expressed by noun phrases or clauses.
Keywords / key phrases are noun phrases or adjectives.
To re-iterate: A noun phrase is made of a noun and the words that modify it. A clause is made of a verb and typically its subject and includes all the words that modify them.
We shall annotate noun phrases just like we annotate protest size and time. Concretely, we shall annotate all the modifiers, and we shall not annotate:
1. articles, e.g.
Thousands [protested] the election fraud
2. prepositions against, for, about, etc. that link the noun to the verb or another noun, e.g.
Doctors and patients [protested] against plans to cut services at the hospital
Rights was formed by women trade unionists, who organised a [demonstration] for equal pay in 1969.
Staff at Bus Éireann are to ballot on industrial [action] due to concerns over the future of the Expressway service.
[Violence] flares after controversial Belfast vote over Union flag.
Note that the latter example is an event phrase (section 8.2.2.).
When we annotate clauses, we shall take all the words in the clause that follow the conjuction, and we shall not annotate the conjunction itself:
The protesters [demanded] that the two officers involved in the shooting be criminally charged.
The trade union subsequently alerted all workers to be on standby for industrial [action], unless the management withdrew its unfavourable proposals.
Note that because we annotate the full clause, we shall take all the words in the clause including the article of the subject like in both examples above.
An important special case occurs with verbs like demand, call on:
The protesters [called] on the Moldovan president to resign.
Protesters [urged] the President to stop deportations.
In a list of issues, we shall annotate each issue expression separately, provided that the expressions do not share words:
... [protested] against police brutality and racism
The protesters [called] on the Moldovan president and chief prosecutor to resign, [demanding] early elections and prompt measures against corruption.
However
... [protested] against social and political injustices
Only brief, slogan-like quotes shall count as quotes in the sense of 8.2.3. Often such quotes are marked by quotation marks, which we shall also annotate, or written in capitalized words.
We shall treat long quotes in direct speech as ordinary sentences.
In this part, we shall describe how to start using the latest version (13.06.2016) of the interface for the combined annotation of protest events at the document and word levels.
The interface works in Google Chrome. In Firefox, the dropdown menu in the left part of the navigation bar is quirky.
The document-level coding interfance and brat
are synchronized, and the logins are synchronized too. Logging in automatically logs you into your brat
account. You see the same document in brat
and the document-level coding part.
To navigate from one document to another, you can
Hit Previous Story
/ Next Story
,
Select the filename from the dropdown menu in the top left (This does not work in Firefox 47.0 on Ubuntu Linux).
Hit Add an event
to add a row of input fields for coding an event. Let's zoom in on it:
A quick guide through some important elements of the event coding interface:
On the top left, you see the already familiar dropdown menu with filenames. It displays the name of the currently edited document, and you can verify that it is the same document name shown in brat
.
An event is a row of dropdown menus and one free form input field for comments:
Action form | Location | Date | Actor | Issue | Protest size | Comment
The action form, location, date and size fields can only have a single value.
There can be at most two issues and multiple actor labels.
You add new events by pressing Add an event
or you can delete the last event by pressing Delete last event
.
The interface saves your annotations automatically as soon as you add / delete an event or natigate to another story.
The date field takes a date range as its input. It knows the publication date of the currently edited document and allows you to pick some preconfigured dates from the list like Publication date
, Publication date - 1 day
, etc. Go to Custom range
to input a date not on this list.
When you code events mentioned without a specific date (e.g. bombings in Mardid last year), you should select a date range that more or less corresponds to this unspecific description and -- importantly -- check the box with the approximately equal sign. In this way, we know that a date lies in the range that you have indicated but does not necessarily span the whole range.
This is how you code issues.
brat
annotations¶The whole purpose of our exercise is to learn how codes relate to text. As a way of explaining this relation, we shall use the following rule and link events coded in the document-level interface to their anchors in brat
.
In the document-level interfance, we have coded all events and they appear enumerated (1, 2, 3, ...).
In brat
, we have exhaustively marked all event anchors.
In brat
, we can put indices (1, 2, 3, ...) on event anchors.
We shall put the index of a coded event on an anchor if the anchor (together with its arguments: location, time, actors...) describes this event (see some examples below).
It is possible that an anchor does not have an index. It is also possible to attach multiple indices to the same anchor. The result should look something like this:
Let us consider some examples. In this following text, we should code 3 events all located in the same city in Romania: an act of political violence (1), a blockade (2), and a demonstration (3). The anchors protest and protesting in the first and fourth sentences refer to all these events at once. This is why we annotate these anchors with the indices of all three events.
Hundreds of steel workers protest (1)(2)(3) planned layoffs in western Romania
BUCHAREST, Romania (AP)
Hundreds of steel workers smashed (1) windows at ruling party offices and blocked (2) a major road Monday to protest thousands of planned layoffs in a western Romanian city.
About 400 workers were protesting (1)(2)(3) layoffs set to occur after the state-owned Hundedoara Steel Plant is sold to Indian-born steel magnate Lakshmi Mittal's LNM company this week. LNM also owns a major steel plant in the Danube port city of Galati in eastern Romania.
Workers marched (3) through the city of Hunedoara and lobbed (1) stones at the offices of the ruling Party of Social Democracy, shattering some windows. [...]
The next text contains a campaign-like event for which we have this rule. This kind of event comprises many events on the same theme in various city-level locations across a country or region. To remind you, the rule says that we code a campaign-like event only if the document does not mention all the events at city-level locations that are part of the campaign.
In this text, events (1) and (5) are campaign-like -- protests across cities of Spain. In the case of event (5), no constituting events at all are reported. As for event (1), likely protests in locations other than Madrid and Barcelona are not mentioned either.
Anti-war protests (1)(3)(4) draw hundreds of thousands in Spain
MADRID, March 20
Hundreds of thousands of Spaniards took (1)(3)(4) to the streets for the second week in succession Saturday to remember the victims of last week's train bombings (2) in Madrid and demand an end to the US-led occupation of Iraq.
Some 200,000 people marched (3) in Barcelona and while a similar demonstration (4) in Madrid was barely half that size it was no less vociferous.
But the numbers were nowhere near the 11.6 million who thronged (5) cities across Spain in an unprecedented show of solidarity the day after the March 11 attacks (2) here, the worst in Spain's history. [...]
Note that we should put the index of a campaign-like event only on the anchors that refer to this campaign, and not on any of the anchors that describe individual events that are part of the campaign -- because these anchors do not refer to the entire campaign. Therefore, in the fourth sentence, marched and demonstration do not carry event index (1).
It is possible that a campaign-like event includes campaign-like events. The next text, from which you see only an excerpt, speaks of a strike campaign, event (1), which comprises all other events from the text including event (2), the walkouts by court employees. Event (2) itself is a campaign: The text does not report any of its individual constituting events from city-level locations.
Greece hit by barrage of strikes (1)(2)(3) to squeeze embattled government
[...]
The latest walkouts (2) closed courts Thursday with court employees demanding higher salaries. Greece's most famous tourist site -- the Acropolis -- was also shut (3) Thursday by contract workers seeking permanent positions. [...]