2014
pdf
abs
The CLARIN Research Infrastructure: Resources and Tools for eHumanities Scholars
Erhard Hinrichs
|
Steven Krauwer
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)
CLARIN is the short name for the Common Language Resources and Technology Infrastructure, which aims at providing easy and sustainable access for scholars in the humanities and social sciences to digital language data and advanced tools to discover, explore, exploit, annotate, analyse or combine them, independent of where they are located. CLARIN is in the process of building a networked federation of European data repositories, service centers and centers of expertise, with single sign-on access for all members of the academic community in all participating countries. Tools and data from different centers will be interoperable so that data collections can be combined and tools from different sources can be chained to perform complex operations to support researchers in their work. Interoperability of language resources and tools in the federation of CLARIN Centers is ensured by adherence to TEI and ISO standards for text encoding, by the use of persistent identifiers, and by the observance of common protocols. The purpose of the present paper is to give an overview of language resources, tools, and services that CLARIN presently offers.
2010
pdf
abs
Cooperation for Arabic Language Resources and Tools — The MEDAR Project
Bente Maegaard
|
Mohamed Attia
|
Khalid Choukri
|
Olivier Hamon
|
Steven Krauwer
|
Mustafa Yaseen
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)
The paper describes some of the work carried out within the European funded project MEDAR. The project has three streams of activity: the technical stream, the cooperation stream and the dissemination stream. MEDAR has first updated the existing surveys and BLARK for Arabic, and then the technical stream focused on machine translation. The consortium identified a number of freely available MT systems and then customized two versions of the famous MOSES package. The Consortium addressed the needs to package MOSES for English to Arabic (while the main MT stream is on Arabic to English). For performance assessment purposes, the partners produced test data that allowed carrying out an evaluation campaign with 5 different systems (including from outside the consortium) and two online ones. Both the MT baselines and the collected data will be made available via ELRA catalogue. The cooperation stream focuses mostly on the cooperation roadmap for Human Language Technologies for Arabic. Cooperation Roadmap for the region directed towards the Arabic HLT in general. It is the purpose of the roadmap to outline areas and priorities for collaboration, in terms of collaboration between EU countries and Arabic speaking countries, as well as cooperation in general: between countries, between universities, and last but not least between universities and industry.
2008
pdf
abs
CLARIN: Common Language Resources and Technology Infrastructure
Tamás Váradi
|
Steven Krauwer
|
Peter Wittenburg
|
Martin Wynne
|
Kimmo Koskenniemi
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
The paper provides a general introduction to the CLARIN project, a large-scale European research infrastructure project designed to establish an integrated and interoperable infrastructure of language resources and technologies. The goal is to make language resources and technology much more accessible to all researchers working with language material, particularly non-expert users in the Humanities and Social Sciences. CLARIN intends to build a virtual, distributed infrastructure consisting of a federation of trusted digital archives and repositories where language resources and tools are accessible through web services. The CLARIN project consists of 32 partners from 22 countries and is currently engaged in the preparatory phase of developing the infrastructure. The paper describes the objectives of the project in terms of its technical, legal, linguistic and user dimensions.
pdf
abs
MEDAR: Collaboration between European and Mediterranean Arabic Partners to Support the Development of Language Technology for Arabic
Bente Maegaard
|
Mohammed Atiyya
|
Khalid Choukri
|
Steven Krauwer
|
Chafic Mokbel
|
Mustafa Yaseen
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)
After the successful completion of the NEMLAR project 2003-2005, a new opportunity for a project was opened by the European Commission, and a group of largely the same partners is now executing the MEDAR project. MEDAR will be updating the surveys and BLARK for Arabic already made, and will then focus on machine translation (and other tools for translation) and information retrieval with a focus on language resources, tools and evaluation for these applications. A very important part of the MEDAR project is to reinforce and extend the NEMLAR network and to create a cooperation roadmap for Human Language Technologies for Arabic. It is expected that the cooperation roadmap will attract wide attention from other parties and that it can help create a larger platform for collaborative projects. Finally, the project will focus on dissemination of knowledge about existing resources and tools, as well as actors and activities; this will happen through newsletter, website and an international conference which will follow up on the Cairo conference of 2004. Dissemination to user communities will also be important, e.g. through participation in translators? conferences. The goal of these activities is to create a stronger and lasting collaboration between EU countries and Arabic speaking countries.
2007
pdf
bib
Is MT in crisis?
Steven Krauwer
Proceedings of the 11th Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages: Plenaries
2006
pdf
abs
Building Annotated Written and Spoken Arabic LRs in NEMLAR Project
M. Yaseen
|
M. Attia
|
B. Maegaard
|
K. Choukri
|
N. Paulsson
|
S. Haamid
|
S. Krauwer
|
C. Bendahman
|
H. Fersøe
|
M. Rashwan
|
B. Haddad
|
C. Mukbel
|
A. Mouradi
|
A. Al-Kufaishi
|
M. Shahin
|
N. Chenfour
|
A. Ragheb
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
The NEMLAR project: Network for Euro-Mediterranean LAnguage Resource and human language technology development and support (www.nemlar.org) was a project supported by the EC with partners from Europe and Arabic countries, whose objective is to build a network of specialized partners to promote and support the development of Arabic Language Resources (LRs) in the Mediterranean region. The project focused on identifying the state of the art of LRs in the region, assessing priority requirements through consultations with language industry and communication players, and establishing a protocol for developing and identifying a Basic Language Resource Kit (BLARK) for Arabic, and to assess first priority requirements. The BLARK is defined as the minimal set of language resources that is necessary to do any pre-competitive research and education, in addition to the development of crucial components for any future NLP industry. Following the identification of high priority resources the NEMLAR partners agreed to focus on, and produce three main resources, which are 1) Annotated Arabic written corpus of about 500 K words, 2) Arabic speech corpus for TTS applications of 2x5 hours, and 3) Arabic broadcast news speech corpus of 40 hours Modern Standard Arabic. For each of the resources underlying linguistic models and assumptions of the corpus, technical specifications, methodologies for the collection and building of the resources, validation and verification mechanisms were put and applied for the three LRs.
pdf
abs
The BLARK concept and BLARK for Arabic
Bente Maegaard
|
Steven Krauwer
|
Khalid Choukri
|
Lise Damsgaard Jørgensen
Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06)
The EU project NEMLAR (Network for Euro-Mediterranean LAnguage Resources) on Arabic language resources carried out two surveys on the availability of Arabic LRs in the region, and on industrial requirements. The project also worked out a BLARK (Basic Language Resource Kit) for Arabic. In this paper we describe the further development of the BLARK concept made during the work on a BLARK for Arabic, as well as the results for Arabic.
2001
pdf
bib
Workshop on MT2010: Towards a Road Map for MT
Steven Krauwer
Workshop on MT2010: Towards a Road Map for MT
1996
pdf
“Is Speech Language?”
Joseph Mariani
|
Steven Krauwer
COLING 1996 Volume 2: The 16th International Conference on Computational Linguistics
1993
pdf
bib
Sixth Conference of the European Chapter of the Association for Computational Linguistics
Steven Krauwer
|
Michael Moortgat
|
Louis des Tombe
Sixth Conference of the European Chapter of the Association for Computational Linguistics
1989
pdf
An Approach to Sentence-Level Anaphora in Machine Translation
Gertjan van Noord
|
Joke Dorrepaal
|
Doug Arnold
|
Steven Krauwer
|
Louisa Sadler
|
Louis des Tombe
Fourth Conference of the European Chapter of the Association for Computational Linguistics
1988
pdf
‘Relaxed’ compositionality in machine translation
Doug Arnold
|
Steven Krauwer
|
Louis des Tombe
|
Louisa Sadler
Proceedings of the Second Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages
1987
pdf
A Model for Preference
Dominique Petitpierre
|
Steven Krauwer
|
Louis des Tombe
|
Doug Arnold
|
Giovanni B. Varile
Third Conference of the European Chapter of the Association for Computational Linguistics
1986
pdf
The <C,A>,T Framework in Eurotra: A Theoretically Committed Notation for MT
D.J. Arnold
|
S. Krauwer
|
M. Rosner
|
L. des Tombe
|
G.B. Varile
Coling 1986 Volume 1: The 11th International Conference on Computational Linguistics
1985
bib
A MUl View of the <C,A>, T Framework in EUROTRA
Doug Arnold
|
Lieven Jaspaert
|
Rod Johnson
|
Steven Krauwer
|
Mike Rosner
|
Louis des Tombe
|
Nino Varile
|
Susan Warwick
Proceedings of the first Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages
A Preliminary Linguistic Framework for EUROTRA, June 1985
Louis des Tombe
|
Doug Arnold
|
Lieven Jaspaert
|
Rod Johnson
|
Steven Krauwer
|
Mike Rosner
|
Nino Varile
|
Susan Warwick
Proceedings of the first Conference on Theoretical and Methodological Issues in Machine Translation of Natural Languages
1984
pdf
Transfer in a Multilingual MT System
Steven Krauwer
|
Louis des Tombe
10th International Conference on Computational Linguistics and 22nd Annual Meeting of the Association for Computational Linguistics