Andrea Mazzucchi


From ‘Solved Problems’ to New Challenges: A Report on LDC Activities
Christopher Cieri | Mark Liberman | Stephanie Strassel | Denise DiPersio | Jonathan Wright | Andrea Mazzucchi
Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)


New Directions for Language Resource Development and Distribution
Christopher Cieri | Denise DiPersio | Mark Liberman | Andrea Mazzucchi | Stephanie Strassel | Jonathan Wright
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Despite the growth in the number of linguistic data centers around the world, their accomplishments and expansions and the advances they have help enable, the language resources that exist are a small fraction of those required to meet the goals of Human Language Technologies (HLT) for the world’s languages and the promises they offer: broad access to knowledge, direct communication across language boundaries and engagement in a global community. Using the Linguistic Data Consortium as a focus case, this paper sketches the progress of data centers, summarizes recent activities and then turns to several issues that have received inadequate attention and proposes some new approaches to their resolution.


Technical Infrastructure at Linguistic Data Consortium: Software and Hardware Resources for Linguistic Data Creation
Kazuaki Maeda | Haejoong Lee | Stephen Grimes | Jonathan Wright | Robert Parker | David Lee | Andrea Mazzucchi
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Linguistic Data Consortium (LDC) at the University of Pennsylvania has participated as a data provider in a variety of governmentsponsored programs that support development of Human Language Technologies. As the number of projects increases, the quantity and variety of the data LDC produces have increased dramatically in recent years. In this paper, we describe the technical infrastructure, both hardware and software, that LDC has built to support these complex, large-scale linguistic data creation efforts at LDC. As it would not be possible to cover all aspects of LDC’s technical infrastructure in one paper, this paper focuses on recent development. We also report on our plans for making our custom-built software resources available to the community as open source software, and introduce an initiative to collaborate with software developers outside LDC. We hope that our approaches and software resources will be useful to the community members who take on similar challenges.