We include four new corpora for research purposes:

French Verbs,
Dutch Verbs,
Czech Verbs,
Czech Nouns.

For all sets, we remove multiword inflections, and inflections involving hyphens.
If a certain slot in the
inflection table inflects to more than one form, we keep the first one.
Furthermore, for French and Dutch, we remove any tables that are incomplete.
For example, certain French verbs are not used in the imperative.

French Verbs:

The verbal inflections included in this corpus were extracted from
Verbiste, an online French conjugation website.
Each table contains 48 inflections:

24: 6 persons each for the Indicative present, past, imperfect, and future
6: 6 persons each for the Conditional present
12: 6 persons each for the Subjunctive present and imperfect
3: 2nd person singular, 1st person plural and 2nd person plural Imperative present
2: Present and past participle
1: Infinitive

Dutch Verbs:

The inflections included in this corpus were taken from CELEX Version 2.0, 
a lexical database.
CELEX contains a subjunctive form for some verbs, but it was sparse,
and is not included in this corpus.
Each table contains 9 inflections:

2: A singular and plural inflection for the past tense
4: All three singular persons, as well as a general plural inflection for the
present tense
2: The present and past participle
1: Infinitive.

Czech:

The corpora for Czech were extracted from the Prague Dependency Treebank,
and are not complete, and not cleaned for incomplete forms.  Unlike
the French and Dutch corpora, which contain features for tense, person,
mood, and type, the Czech forms only contain a single feature: tag.
Tag is equivalent to the abstract (yet readable) inflectional tags used in our NAACL paper.

Nouns:

We make no claims as to the completeness of our tables,
but observe 17 different tags in the corpus, which mimic the tags
from the Treebank, and are positional in nature, but have been simplified.
Some features have been combined.
Each consists of 2 markers that identify the inflection.
The first marker is the plurality of the noun, and can be either
S, for singular, P, for plural, D, for Dual, or X, for any gender.

The second marker marks case:
1 = Nominative
2 = Genitive
3 = Dative
4 = Accusative
5 = Vocative
6 = Locative
7 = Instrumental
X = Any.

Although 32 different tags are possible, we only
observe 17:

D7
P1
P2
P3
P4
P6
P7
PX
S1
S2
S3
S4
S5
S6
S7
SX
XX


Verbs:

Like Czech Nouns, the tags are positional, 
but mark six features. 
Some features are redundant,
but included because of the positional
nature of the tags.  For any feature, - 
means 'undefined', which is different than 'any':

The first feature marks mood, and some tenses:

B = Present or Future Tense
C = Conditional mood
E = Transgressive present tense
M = Transgressive past tense
F = Infinitive
I = Imperative mood
P = Past participle, active
S = Past participle, passive


The second marks plurality:

S = Singular
P = Plural
X = Any
W = Singular for Feminine, Plural for Neuter

The third is person:

1 = 1st person
2 = 2nd person
3 = 3rd person
X = any person

The fourth is tense: 

F = Future
P = Present
R = Past
X = Any

The fifth is polarity:

A = Affirmative
N = Negative

The sixth is passivity:

A = Active
P = Passive

For example PS2RAA indicates
for the past participle: 2nd person singular past affirmative, active,
and

SPXXNP indicates past participle: any plural person, negative, passive.

The following 54 tags are observed in the corpus:

BP1FAA
BP1FNA
BP1PAA
BP1PNA
BP2FAA
BP2FNA
BP2PAA
BP2PNA
BP3FAA
BP3FNA
BP3PAA
BP3PNA
BS1FAA
BS1FNA
BS1PAA
BS1PNA
BS2PAA
BS2PNA
BS3FAA
BS3FNA
BS3PAA
BS3PNA
BXXPAA
C-----
CP1---
CP2---
CS1---
CS2---
EP--A-
ES--A-
ES--N-
F---A-
F---N-
IP1-A-
IP1-N-
IP2-A-
IP2-N-
IS2-A-
IS2-N-
IS3-A-
MS--A-
PPXRAA
PPXRNA
PS2RAA
PSXRAA
PSXRNA
PWXRAA
PWXRNA
SPXXAP
SPXXNP
SSXXAP
SSXXNP
SWXXAP
SWXXNP

