Abstract
We describe an investigation into the identification and extraction of unrecorded potential lexical items in Japanese text by detecting text passages containing selected language patterns typically associated with such items. We identified a set of suitable patterns, then tested them with two large collections of text drawn from the WWW and Twitter. Samples of the extracted items were evaluated, and it was demonstrated that the approach has considerable potential for identifying terms for later lexicographic analysis.- Anthology ID:
- 2018.gwc-1.19
- Volume:
- Proceedings of the 9th Global Wordnet Conference
- Month:
- January
- Year:
- 2018
- Address:
- Nanyang Technological University (NTU), Singapore
- Editors:
- Francis Bond, Piek Vossen, Christiane Fellbaum
- Venue:
- GWC
- SIG:
- SIGLEX
- Publisher:
- Global Wordnet Association
- Note:
- Pages:
- 163–171
- Language:
- URL:
- https://aclanthology.org/2018.gwc-1.19
- DOI:
- Cite (ACL):
- James Breen, Timothy Baldwin, and Francis Bond. 2018. The Company They Keep: Extracting Japanese Neologisms Using Language Patterns. In Proceedings of the 9th Global Wordnet Conference, pages 163–171, Nanyang Technological University (NTU), Singapore. Global Wordnet Association.
- Cite (Informal):
- The Company They Keep: Extracting Japanese Neologisms Using Language Patterns (Breen et al., GWC 2018)
- PDF:
- https://preview.aclanthology.org/improve-issue-templates/2018.gwc-1.19.pdf