Jianqiang Ma


2021

pdf
Frustratingly Simple Few-Shot Slot Tagging
Jianqiang Ma | Zeyu Yan | Chang Li | Yang Zhang
Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021

pdf
Inconsistency Matters: A Knowledge-guided Dual-inconsistency Network for Multi-modal Rumor Detection
Mengzhu Sun | Xi Zhang | Jianqiang Ma | Yazheng Liu
Findings of the Association for Computational Linguistics: EMNLP 2021

Rumor spreaders are increasingly utilizing multimedia content to attract the attention and trust of news consumers. Though a set of rumor detection models have exploited the multi-modal data, they seldom consider the inconsistent relationships among images and texts. Moreover, they also fail to find a powerful way to spot the inconsistency information among the post contents and background knowledge. Motivated by the intuition that rumors are more likely to have inconsistency information in semantics, a novel Knowledge-guided Dual-inconsistency network is proposed to detect rumors with multimedia contents. It can capture the inconsistent semantics at the cross-modal level and the content-knowledge level in one unified framework. Extensive experiments on two public real-world datasets demonstrate that our proposal can outperform the state-of-the-art baselines.

2020

pdf
Mention Extraction and Linking for SQL Query Generation
Jianqiang Ma | Zeyu Yan | Shuai Pang | Yang Zhang | Jianping Shen
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

On the WikiSQL benchmark, state-of-the-art text-to-SQL systems typically take a slot- filling approach by building several dedicated models for each type of slots. Such modularized systems are not only complex but also of limited capacity for capturing inter-dependencies among SQL clauses. To solve these problems, this paper proposes a novel extraction-linking approach, where a unified extractor recognizes all types of slot mentions appearing in the question sentence before a linker maps the recognized columns to the table schema to generate executable SQL queries. Trained with automatically generated annotations, the proposed method achieves the first place on the WikiSQL benchmark.

pdf
SQL Generation via Machine Reading Comprehension
Zeyu Yan | Jianqiang Ma | Yang Zhang | Jianping Shen
Proceedings of the 28th International Conference on Computational Linguistics

Text-to-SQL systems offers natural language interfaces to databases, which can automatically generates SQL queries given natural language questions. On the WikiSQL benchmark, state-of- the-art text-to-SQL systems typically take a slot-filling approach by building several specialized models for each type of slot. Despite being effective, such modularized systems are complex and also fall short in jointly learning for different slots. To solve these problems, this paper proposes a novel approach that formulates the task as a question answering problem, where different slots are predicted by a unified machine reading comprehension (MRC) model. For this purpose, we use a BERT-based MRC model, which can also benefit from intermediate training on other MRC datasets. The proposed method can achieve competitive results on WikiSQL, suggesting it being a promising direction for text-to-SQL.

pdf
FASTMATCH: Accelerating the Inference of BERT-based Text Matching
Shuai Pang | Jianqiang Ma | Zeyu Yan | Yang Zhang | Jianping Shen
Proceedings of the 28th International Conference on Computational Linguistics

Recently, pre-trained language models such as BERT have shown state-of-the-art accuracies in text matching. When being applied to IR (or QA), the BERT-based matching models need to online calculate the representations and interactions for all query-candidate pairs. The high inference cost has prohibited the deployments of BERT-based matching models in many practical applications. To address this issue, we propose a novel BERT-based text matching model, in which the representations and the interactions are decoupled. Then, the representations of the candidates can be calculated and stored offline, and directly retrieved during the online matching phase. To conduct the interactions and generate final matching scores, a lightweight attention network is designed. Experiments based on several large scale text matching datasets show that the proposed model, called FASTMATCH, can achieve up to 100X speed-up to BERT and RoBERTa at the online matching phase, while keeping more up to 98.7% of the performance.

2017

pdf
PP Attachment: Where do We Stand?
Daniël de Kok | Jianqiang Ma | Corina Dima | Erhard Hinrichs
Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers

Prepostitional phrase (PP) attachment is a well known challenge to parsing. In this paper, we combine the insights of different works, namely: (1) treating PP attachment as a classification task with an arbitrary number of attachment candidates; (2) using auxiliary distributions to augment the data beyond the hand-annotated training set; (3) using topological fields to get information about the distribution of PP attachment throughout clauses and (4) using state-of-the-art techniques such as word embeddings and neural networks. We show that jointly using these techniques leads to substantial improvements. We also conduct a qualitative analysis to gauge where the ceiling of the task is in a realistic setup.

2016

pdf
Learning Phone Embeddings for Word Segmentation of Child-Directed Speech
Jianqiang Ma | Çağrı Çöltekin | Erhard Hinrichs
Proceedings of the 7th Workshop on Cognitive Aspects of Computational Language Learning

pdf
Letter Sequence Labeling for Compound Splitting
Jianqiang Ma | Verena Henrich | Erhard Hinrichs
Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology

2015

pdf
Accurate Linear-Time Chinese Word Segmentation via Embedding Matching
Jianqiang Ma | Erhard Hinrichs
Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)

2014

pdf
Automatic Refinement of Syntactic Categories in Chinese Word Structures
Jianqiang Ma
Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14)

Annotated word structures are useful for various Chinese NLP tasks, such as word segmentation, POS tagging and syntactic parsing. Chinese word structures are often represented by binary trees, the nodes of which are labeled with syntactic categories, due to the syntactic nature of Chinese word formation. It is desirable to refine the annotation by labeling nodes of word structure trees with more proper syntactic categories so that the combinatorial properties in the word formation process are better captured. This can lead to improved performances on the tasks that exploit word structure annotations. We propose syntactically inspired algorithms to automatically induce syntactic categories of word structure trees using POS tagged corpus and branching in existing Chinese word structure trees. We evaluate the quality of our annotation by comparing the performances of models based on our annotation and another publicly available annotation, respectively. The results on two variations of Chinese word segmentation task show that using our annotation can lead to significant performance improvements.

2012

pdf
Phrase-Based Approach for Adaptive Tokenization
Jianqiang Ma | Dale Gerdemann
Proceedings of the Twelfth Meeting of the Special Interest Group on Computational Morphology and Phonology

pdf
Semi-automatic Annotation of Chinese Word Structure
Jianqiang Ma | Chunyu Kit | Dale Gerdemann
Proceedings of the Second CIPS-SIGHAN Joint Conference on Chinese Language Processing

2009

pdf
A generalized method for iterative error mining in parsing results
Daniël de Kok | Jianqiang Ma | Gertjan van Noord
Proceedings of the 2009 Workshop on Grammar Engineering Across Frameworks (GEAF 2009)