Xiaobin Chen


CTAP for Chinese:A Linguistic Complexity Feature Automatic Calculation Platform
Yue Cui | Junhui Zhu | Liner Yang | Xuezhi Fang | Xiaobin Chen | Yujie Wang | Erhong Yang
Proceedings of the Thirteenth Language Resources and Evaluation Conference

The construct of linguistic complexity has been widely used in language learning research. Several text analysis tools have been created to automatically analyze linguistic complexity. However, the indexes supported by several existing Chinese text analysis tools are limited and different because of different research purposes. CTAP is an open-source linguistic complexity measurement extraction tool, which prompts any research purposes. Although it was originally developed for English, the Unstructured Information Management (UIMA) framework it used allows the integration of other languages. In this study, we integrated the Chinese component into CTAP, describing the index sets it incorporated and comparing it with three linguistic complexity tools for Chinese. The index set includes four levels of 196 linguistic complexity indexes: character level, word level, sentence level, and discourse level. So far, CTAP has implemented automatic calculation of complexity characteristics for four languages, aiming to help linguists without NLP background study language complexity.


Using Broad Linguistic Complexity Modeling for Cross-Lingual Readability Assessment
Zarah Weiss | Xiaobin Chen | Detmar Meurers
Proceedings of the 10th Workshop on NLP for Computer Assisted Language Learning


pdf bib
Challenging learners in their individual zone of proximal development using pedagogic developmental benchmarks of syntactic complexity
Xiaobin Chen | Detmar Meurers
Proceedings of the joint workshop on NLP for Computer Assisted Language Learning and NLP for Language Acquisition


Characterizing Text Difficulty with Word Frequencies
Xiaobin Chen | Detmar Meurers
Proceedings of the 11th Workshop on Innovative Use of NLP for Building Educational Applications

CTAP: A Web-Based Tool Supporting Automatic Complexity Analysis
Xiaobin Chen | Detmar Meurers
Proceedings of the Workshop on Computational Linguistics for Linguistic Complexity (CL4LC)

Informed by research on readability and language acquisition, computational linguists have developed sophisticated tools for the analysis of linguistic complexity. While some tools are starting to become accessible on the web, there still is a disconnect between the features that can in principle be identified based on state-of-the-art computational linguistic analysis, and the analyses a second language acquisition researcher, teacher, or textbook writer can readily obtain and visualize for their own collection of texts. This short paper presents a web-based tool development that aims to meet this challenge. The Common Text Analysis Platform (CTAP) is designed to support fully configurable linguistic feature extraction for a wide range of complexity analyses. It features a user-friendly interface, modularized and reusable analysis component integration, and flexible corpus and feature management. Building on the Unstructured Information Management framework (UIMA), CTAP readily supports integration of state-of-the-art NLP and complexity feature extraction maintaining modularization and reusability. CTAP thereby aims at providing a common platform for complexity analysis, encouraging research collaboration and sharing of feature extraction components—to jointly advance the state-of-the-art in complexity analysis in a form that readily supports real-life use by ordinary users.