Entities are taken from Wikipedia and mapped to Freebase MIDs. 
From Freebase then we take the types and mapped them to FIGER types. 
There are 2,184,265 entities and 102 FIGER types in the full dataset. 

For the efficiency in doing experiments, we then sampled randomly a number of entities and
made train (74,543), dev (35,275) and test (50,265) datasets. 

Each line of the dataset is one entity, formatted as:

Freebase_MID   English_name   | list_of_lang_to_name |   FIGER_types


Here is an example line for the entity (Emirate of Beihan) : 
/m/07n_x0   Emirate_of_Beihan  | ru:Бейхан_(эмират) en:Emirate_of_Beihan nl:Emiraat_Beihan hr:Emirat_Bejhan de:Emirat_Beihan tr:Bihan_Emirliği it:Emirato_di_Beihan sh:Emirat_Bejhan wikidata:Q2485209 uk:Бейхан_(емірат) ar:إمارة_بيحان |   /location/country /location

We didn't include the dev part due to supplementary size limit (10MB). 



