This is an internal, incomplete preview of a proposed change to the ACL Anthology.
For efficiency reasons, we don't generate MODS or Endnote formats, and the preview may be incomplete in other ways, or contain mistakes.
Do not treat this content as an official publication.
MasaharuYoshioka
Fixing paper assignments
Please select all papers that do not belong to this person.
Indicate below which author they should be assigned to.
This paper aims to augment fans’ ability to critique and exploreinformation related to celebrities of interest. First, we collect postsfrom X (formerly Twitter) that discuss matters related to specificcelebrities. For the collection of major impressions from these posts,we employ ChatGPT as a large language model (LLM) to analyze andsummarize key sentiments. Next, based on collected impressions, wesearch for Web pages and collect the content of the top 30 ranked pagesas the source for exploring the reasons behind those impressions. Oncethe Web page content collection is complete, we collect and aggregatedetailed reasons for the impressions on the celebrities from the contentof each page. For this part, we continue to use ChatGPT, enhanced bythe retrieval augmented generation (RAG) framework, to ensure thereliability of the collected results compared to relying solely on theprior knowledge of the LLM. Evaluation results by comparing a referencethat is manually collected and aggregated reasons with those predictedby ChatGPT revealed that ChatGPT achieves high accuracy in reasoncollection and aggregation. Furthermore, we compared the performance ofChatGPT with an existing model of mT5 in reason collection and confirmedthat ChatGPT exhibits superior performance.
Survey research using open-ended responses is an important method thatcontributes to the discovery of unknown issues and new needs. However,survey research generally requires time and cost-consuming manual dataprocessing, indicating that it is difficult to analyze large dataset.To address this issue, we propose an LLM-based method to automate partsof the grounded theory approach (GTA), a representative approach of thequalitative data analysis. We generated and annotated pseudo open-endedresponses, and used them as the training data for the coding proceduresof GTA. Through evaluations, we showed that the models trained withpseudo open-ended responses are quite effective compared with thosetrained with manually annotated open-ended responses. We alsodemonstrate that the LLM-based approach is highly efficient andcost-saving compared to human-based approach.
Search engine is an important tool of modern academic study, but the results are lack of measurement of beginner friendliness. In order to improve the efficiency of using search engine for academic study, it is necessary to invent a technique of measuring the beginner friendliness of a Web page explaining academic concepts and to build an automatic measurement system. This paper studies how to integrate heterogeneous features such as a neural image feature generated from the image of the Web page by a variant of CNN (convolutional neural network) as well as text features extracted from the body text of the HTML file of the Web page. Integration is performed through the framework of the SVM classifier learning. Evaluation results show that heterogeneous features perform better than each individual type of features.