Assaf Urieli


2024

pdf
CorpusArièja: Building an Annotated Corpus with Variation in Occitan
Clamenca Poujade | Myriam Bras | Assaf Urieli
Proceedings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024

The Occitan language is a less resourced language and is classified as ‘in danger’ by the UNESCO. Thereby, it is important to build resources and tools that can help to safeguard and develop the digitisation of the language. CorpusArièja is a collection of 72 texts (just over 41,000 tokens) in the Occitan language of the French department of Ariège. The majority of the texts needed to be digitised and pass within an Optical Character Recognition. This corpus contains dialectal and spelling variation, but is limited to prose, without diachronic variation or genre variation. It is an annotated corpus with two levels of lemmatisation, POS tags and verbal inflection. One of the main aims of the corpus is to enable the conception of tools that can automatically annotate all Occitan texts, regardless of the dialect or spelling used. The Ariège territory is interesting because it includes the two variations that we focus on, dialectal and spelling. It has plenty of authors that write in their native language, their variety of Occitan.

2017

pdf
Non-Projectivity in Serbian: Analysis of Formal and Linguistic Properties
Aleksandra Miletic | Assaf Urieli
Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017)

2015

pdf bib
Stratégies pour l’étiquetage et l’analyse syntaxique statistique de phénomènes difficiles en français : études de cas avec Talismane [Strategies for statistical POS-tagging and parsing of difficult phenomena in French: case studies using Talismane]
Assaf Urieli
Traitement Automatique des Langues, Volume 56, Numéro 1 : Varia [Varia]

2014

pdf
Pos-tagging different varieties of Occitan with single-dialect resources
Marianne Vergez-Couret | Assaf Urieli
Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects

pdf
Improving the parsing of French coordination through annotation standards and targeted features
Assaf Urieli
Proceedings of the First Joint Workshop on Statistical Parsing of Morphologically Rich Languages and Syntactic Analysis of Non-Canonical Languages

pdf
Better pos-tagging for “que” through targeted features and rules (Améliorer l’étiquetage de “que” par les descripteurs ciblés et les règles) [in French]
Assaf Urieli
Proceedings of TALN 2014 (Volume 1: Long Papers)

2013

pdf
APPLYING A BEAM SEARCH TO TRANSITION-BASED DEPENDENCY PARSING: A CASE STUDY FOR FRENCH WITH THE TALISMANE SUITE (L’apport du faisceau dans l’analyse syntaxique en dépendances par transitions : études de cas avec l’analyseur Talismane) [in French]
Assaf Urieli | Ludovic Tanguy
Proceedings of TALN 2013 (Volume 1: Long Papers)