This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
Viet ThanhPham
Fixing paper assignments
Please select all papers that belong to the same person.
Indicate below which author they should be assigned to.
Opinion survey research is a crucial method used by social scientists for understanding societal beliefs and behaviors. Traditional methodologies often entail high costs and limited scalability, while current automated methods such as opinion synthesis exhibit severe biases and lack traceability. In this paper, we introduce SurveyPilot, a novel finite-state orchestrated agentic framework that automates the collection and analysis of human opinions from social media platforms. SurveyPilot addresses the limitations of pioneering approaches by (i) providing transparency and traceability in each state of opinion collection and (ii) incorporating several techniques for mitigating biases, notably with a novel genetic algorithm for improving result diversity. Our extensive experiments reveal that SurveyPilot achieves a close alignment with authentic survey results across multiple domains, observing average relative improvements of 68,98% and 51,37% when comparing to opinion synthesis and agent-based approaches. Implementation of SurveyPilot is available on https://github.com/thanhpv2102/SurveyPilot.
Despite achieving remarkable performance, machine translation (MT) research remains underexplored in terms of translating cultural elements in languages, such as idioms, proverbs, and colloquial expressions. This paper investigates the capability of state-of-the-art neural machine translation (NMT) and large language models (LLMs) in translating proverbs, which are deeply rooted in cultural contexts. We construct a translation dataset of standalone proverbs and proverbs in conversation for four language pairs. Our experiments show that the studied models can achieve good translation between languages with similar cultural backgrounds, and LLMs generally outperform NMT models in proverb translation. Furthermore, we find that current automatic evaluation metrics such as BLEU, CHRF++ and COMET are inadequate for reliably assessing the quality of proverb translation, highlighting the need for more culturally aware evaluation metrics.
Large language models, despite their remarkable success in recent years, still exhibit severe cultural bias. Therefore, in this paper, we introduce CultureInstruct, a large-scale instruction-tuning dataset designed to reduce cultural bias in LLMs. CultureInstruct is constructed with an automatic pipeline, utilizing public web sources and a specialized LLM to generate instruction. Our data comprises 430K instructions, ranging from classic NLP tasks to complex reasoning. CultureInstruct also covers 11 most relevant topics to cultural knowledge, making it highly diverse. Our experiments show that fine-tuning LLMs with CultureInstruct results in consistent improvements across three types of cultural benchmarks, including (i) general cultural knowledge, (ii) human opinions and values, and (iii) linguistic cultural bias. Our best model, Qwen2-Instruct 72B + CultureInstruct, outperforms GPT-4o Mini and GPT-4o with 18.47% and 13.07% average relative improvements on cultural benchmarks.
Sociocultural norms serve as guiding principles for personal conduct in social interactions within a particular society or culture. The study of norm discovery has seen significant development over the last few years, with various interesting approaches. However, it is difficult to adopt these approaches to discover norms in a new culture, as they rely either on human annotations or real-world dialogue contents. This paper presents a robust automatic norm discovery pipeline, which utilizes the cultural knowledge of GPT-3.5 Turbo (ChatGPT) along with several social factors. By using these social factors and ChatGPT, our pipeline avoids the use of human dialogues that tend to be limited to specific scenarios, as well as the use of human annotations that make it difficult and costly to enlarge the dataset. The resulting database - Multi-cultural Norm Base (MNB) - covers 6 distinct cultures, with over 150k sociocultural norm statements in total. A state-of-the-art Large Language Model (LLM), Llama 3, fine-tuned with our proposed dataset, shows remarkable results on various downstream tasks, outperforming models fine-tuned on other datasets significantly.