Ziqi Zhang

2016

pdf abs
JATE 2.0: Java Automatic Term Extraction with Apache Solr
Ziqi Zhang | Jie Gao | Fabio Ciravegna
Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16)

Automatic Term Extraction (ATE) or Recognition (ATR) is a fundamental processing step preceding many complex knowledge engineering tasks. However, few methods have been implemented as public tools and in particular, available as open-source freeware. Further, little effort is made to develop an adaptable and scalable framework that enables customization, development, and comparison of algorithms under a uniform environment. This paper introduces JATE 2.0, a complete remake of the free Java Automatic Term Extraction Toolkit (Zhang et al., 2008) delivering new features including: (1) highly modular, adaptable and scalable ATE thanks to integration with Apache Solr, the open source free-text indexing and search platform; (2) an extended collection of state-of-the-art algorithms. We carry out experiments on two well-known benchmarking datasets and compare the algorithms along the dimensions of effectiveness (precision) and efficiency (speed and memory consumption). To the best of our knowledge, this is by far the only free ATE library offering a flexible architecture and the most comprehensive collection of algorithms.

2013

pdf
Mining Equivalent Relations from Linked Data
Ziqi Zhang | Anna Lisa Gentile | Isabelle Augenstein | Eva Blomqvist | Fabio Ciravegna
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

2012

pdf abs
Automatically Extracting Procedural Knowledge from Instructional Texts using Natural Language Processing
Ziqi Zhang | Philip Webster | Victoria Uren | Andrea Varga | Fabio Ciravegna
Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12)

Procedural knowledge is the knowledge required to perform certain tasks, and forms an important part of expertise. A major source of procedural knowledge is natural language instructions. While these readable instructions have been useful learning resources for human, they are not interpretable by machines. Automatically acquiring procedural knowledge in machine interpretable formats from instructions has become an increasingly popular research topic due to their potential applications in process automation. However, it has been insufficiently addressed. This paper presents an approach and an implemented system to assist users to automatically acquire procedural knowledge in structured forms from instructions. We introduce a generic semantic representation of procedures for analysing instructions, using which natural language techniques are applied to automatically extract structured procedures from instructions. The method is evaluated in three domains to justify the generality of the proposed semantic representation as well as the effectiveness of the implemented automatic system.

2011

pdf
Harnessing different knowledge sources to measure semantic relatedness under a uniform model
Ziqi Zhang | Anna Lisa Gentile | Fabio Ciravegna
Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing

2010

pdf abs
Improving Domain-specific Entity Recognition with Automatic Term Recognition and Feature Extraction
Ziqi Zhang | José Iria | Fabio Ciravegna
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Domain specific entity recognition often relies on domain-specific knowledge to improve system performance. However, such knowledge often suffers from limited domain portability and is expensive to build and maintain. Therefore, obtaining it in a generic and unsupervised manner would be a desirable feature for domain-specific entity recognition systems. In this paper, we introduce an approach that exploits domain-specificity of words as a form of domain-knowledge for entity-recognition tasks. Compared to prior work in the field, our approach is generic and completely unsupervised. We empirically show an improvement in entity extraction accuracy when features derived by our unsupervised method are used, with respect to baseline methods that do not employ domain knowledge. We also compared the results against those of existing systems that use manually crafted domain knowledge, and found them to be competitive.

pdf abs
A Random Graph Walk based Approach to Computing Semantic Relatedness Using Knowledge from Wikipedia
Ziqi Zhang | Anna Lisa Gentile | Lei Xia | José Iria | Sam Chapman
Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC'10)

Determining semantic relatedness between words or concepts is a fundamental process to many Natural Language Processing applications. Approaches for this task typically make use of knowledge resources such as WordNet and Wikipedia. However, these approaches only make use of limited number of features extracted from these resources, without investigating the usefulness of combining various different features and their importance in the task of semantic relatedness. In this paper, we propose a random walk model based approach to measuring semantic relatedness between words or concepts, which seamlessly integrates various features extracted from Wikipedia to compute semantic relatedness. We empirically study the usefulness of these features in the task, and prove that by combining multiple features that are weighed according to their importance, our system obtains competitive results, and outperforms other systems on some datasets.

2009

pdf
Too Many Mammals: Improving the Diversity of Automatically Recognized Terms
Ziqi Zhang | Lei Xia | Mark A. Greenwood | José Iria
Proceedings of the International Conference RANLP-2009

pdf bib
A Novel Approach to Automatic Gazetteer Generation using Wikipedia
Ziqi Zhang | José Iria
Proceedings of the 2009 Workshop on The People’s Web Meets NLP: Collaboratively Constructed Semantic Resources (People’s Web)

2008

pdf abs
A Comparative Evaluation of Term Recognition Algorithms
Ziqi Zhang | Jose Iria | Christopher Brewster | Fabio Ciravegna
Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08)

Automatic Term recognition (ATR) is a fundamental processing step preceding more complex tasks such as semantic search and ontology learning. From a large number of methodologies available in the literature only a few are able to handle both single and multi-word terms. In this paper we present a comparison of five such algorithms and propose a combined approach us¬ing a voting mechanism. We evaluated the six approaches using two different corpora and show how the voting algo¬rithm performs best on one corpus (a collection of texts from Wikipedia) and less well using the Genia corpus (a standard life science corpus). This indicates that choice and design of corpus has a major impact on the evaluation of term recog¬nition algorithms. Our experiments also showed that single-word terms can be equally important and occupy a fairly large proportion in certain domains. As a result, algorithms that ignore single-word terms may cause problems to tasks built on top of ATR. Effective ATR systems also need to take into account both the unstructured text and the structured aspects and this means information extraction techniques need to be integrated into the term recognition process.