Abstract
Bracketed corpora are a very useful resource for natural language processing, but hard to build efficiently, leading to quantitative insufficiency for practical use. Disparities in morphological information, such as word segmentation and part-of-speech tag sets, are also troublesome. An application specific to a particular corpus often cannot be applied to another corpus. In this paper, we sketch out a method to build a corpus that has a fixed syntactic structure but varying morphological annotation based on the different tag set schemes utilized. Our system uses a two layered grammar, one layer of which is made up of replaceable tag-set-dependent rules while the other has no such tag set dependency. The input sentences of our system are bracketed corresponding to structural information of corpus. The parser can work using any tag set and grammar, and using the same input bracketing, we obtain corpus that shares partial syntactic structure.- Anthology ID:
- 1999.mtsummit-1.80
- Volume:
- Proceedings of Machine Translation Summit VII
- Month:
- September 13-17
- Year:
- 1999
- Address:
- Singapore, Singapore
- Venue:
- MTSummit
- SIG:
- Publisher:
- Note:
- Pages:
- 543–546
- Language:
- URL:
- https://aclanthology.org/1999.mtsummit-1.80
- DOI:
- Cite (ACL):
- Masahiro Ueki, Takenobu Tokunaga, and Hozumi Tanaka. 1999. Sharing syntactic structures. In Proceedings of Machine Translation Summit VII, pages 543–546, Singapore, Singapore.
- Cite (Informal):
- Sharing syntactic structures (Ueki et al., MTSummit 1999)
- PDF:
- https://preview.aclanthology.org/add_acl24_videos/1999.mtsummit-1.80.pdf