Abstract
We describe our use of RSS news feeds to quickly assemble a parallel English-Japanese corpus. Our method is simpler than other web mining approaches, and it produces a parallel corpus whose quality, quantity, and rate of growth are stable and predictable.- Anthology ID:
- 2005.mtsummit-ebmt.8
- Volume:
- Workshop on example-based machine translation
- Month:
- September 13-15
- Year:
- 2005
- Address:
- Phuket, Thailand
- Venue:
- MTSummit
- SIG:
- Publisher:
- Note:
- Pages:
- 59–62
- Language:
- URL:
- https://aclanthology.org/2005.mtsummit-ebmt.8
- DOI:
- Cite (ACL):
- John Fry. 2005. Assembling a Parallel Corpus from RSS News Feeds. In Workshop on example-based machine translation, pages 59–62, Phuket, Thailand.
- Cite (Informal):
- Assembling a Parallel Corpus from RSS News Feeds (Fry, MTSummit 2005)
- PDF:
- https://preview.aclanthology.org/paclic-22-ingestion/2005.mtsummit-ebmt.8.pdf