Kathleen Egan


Machine Translation Revisited: An Operational Reality Check
Kathleen Egan
Proceedings of the 10th Conference of the Association for Machine Translation in the Americas: Government MT User Program

The government and the research community have strived for the past few decades to develop machine translation capabilities. Historically, DARPA took the lead in the grand challenge aiming at surpassing human translation quality. While we have made strides from rule based, to statistical and hybrid machine translation engines, we cannot rely solely on machine translation to overcome the language barrier and accomplish the mission. Machine Translation is often misunderstood or misplaced in the operational settings as expectations are unrealistic and optimization not achieved. With the increase in volume, variety and velocity of data, new paradigms are needed when choosing machine translation software and embedding it into a business process so as to achieve the operational goals. The talk will focus on the operational requirements and frame where, when and how to use machine translation. We will also outline some gaps and suggest new areas for research, development, and implementation.


Utilizing Automated Translation with Quality Scores to Increase Productivity
Daniel Marcu | Kathleen Egan | Chuck Simmons | Ning-Ning Mahlmann
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Government MT User Program

Automated translation can assist with a variety of translation needs in government, from speeding up access to information for intelligence work to helping human translators increase their productivity. However, government entities need to have a mechanism in place so that they know whether or not they can trust the output from automated translation solutions. In this presentation, Language Weaver will present a new capability "TrustScore": an automated scoring algorithm that communicates how good the automated translation is, using a meaningful metric. With this capability, each translation is automatically assigned a score from 1 to 5 in the TrustScore. A score of 1 would indicate that the translation is unintelligible; a score of 3 would indicate that meaning has been conveyed and that the translated content is actionable. A score approaching 4 or higher would indicate that meaning and nuance have been carried through. This automatic prediction of quality has been validated by testing done across significant numbers of data points in different companies and on different types of content. After outlining TrustScore, and how it works, Language Weaver will discuss how a scoring mechanism like TrustScore could be used in a translation productivity workflow in government to assist linguists with day to day translation work. This would enable them to further benefit from their investments in automated translation software. Language Weaver would also share how TrustScore is used in commercial deployments to cost effectively publish information in near real time.

Cross Lingual Arabic Blog Alerting (COLABA)
Kathleen Egan
Proceedings of the 9th Conference of the Association for Machine Translation in the Americas: Government MT User Program

Social media and tools for communication over the Internet have expanded a great deal in recent years. This expansion offers a diverse set of users a means to communicate more freely and spontaneously in mixed languages and genres (blogs, message boards, chat, texting, video and images). Dialectal Arabic is pervasive in written social media, however current state of the art tools made for Modern Standard Arabic (MSA) fail on Arabic dialects. COLABA enables MSA users to interpret dialects correctly. It helps find Arabic colloquial content that is currently not easily searchable and accessible to MSA queries. The COLABA team has built a suite of tools that will offer users the ability to anonymously capture online unstructured media content from blogs to comprehend, organize, and validate content from informal and colloquial genres of online communication in MSA and a variety of Arabic dialects. The DoD/Combating Terrorism Technical Support Office/Technical Support Working Group (CTTSO/TSWG) awarded the contract to Acxiom Corporation and partners from MTI/IBM, Columbia University, Janya and Wichita State University to bring joint expertise to address this challenge. The suite has several use applications: Support for language and cultural learning by making colloquial Arabic intelligible to students of MSA; Retrieval and prioritization for triage and content analysis by finding Arabic colloquial and dialect terms that today's search engines miss; by providing appropriate interpretations of colloquial Arabic, which is opaque to current analytics approaches; and by Identify named entities, events, topics, and sentiment. Enabling improved translations by MSA-trained MT systems through decreases in out-of-vocabulary terms achieved by means of colloquial term conversion to MSA.


User-centered MT Development and Implementation
Kathleen Egan | Francis Kubala | Allen Sears
Proceedings of the 8th Conference of the Association for Machine Translation in the Americas: Government and Commercial Uses of MT


pdf bib
The foreign language challenge in the USG and machine translation.
Kathleen Egan
Proceedings of the Workshop on Machine translation in practice: from old guard to new guard

The internet is no longer English only. The data is voluminous and the number of proficient linguists cannot match the day to day needs of several government agencies. Handling foreign languages is not limited to translating documents but goes beyond the journalistic written formats. Military, diplomatic and official interactions in the US and abroad require more than one or two foreign language skills. The CHALLENGE is both managing the user’s expectations and stimulating new areas for MT research and development.