Researchers Combine DeepL and GPT-4 to Automate (Research) Questionnaire Translation

In a July 30, 2024 research paper, Otso Haavisto and Robin Welsch from Aalto University presented a web application designed to simplify the process of adapting questionnaires for different languages and cultures.

This tool aims to assist researchers conducting cross-cultural studies, enhancing the quality and efficiency of questionnaire adaptation, while promoting equitable research practices.

Haavisto and Welsch highlighted that translating questionnaires is often costly and “resource-intensive,” requiring multiple independent translators and extensive validation processes. According to the authors, this complexity has led to inequalities in research, particularly in non-English-speaking and low-income regions where access to quality questionnaires is limited.

In questionnaire translation, maintaining semantic similarity is crucial to ensure that the translated version retains the same meaning as the original. As the authors noted, “semantic similarity is more important than word-by-word match.” According to the authors, cultural nuances and colloquial expressions can further complicate this process, making it difficult to achieve accurate translations.

To address these challenges, they developed a web application that allows users to translate questionnaires, edit translations, backtranslate to the source language for comparisons against the original, and receive evaluations of translation quality generated by a large language model (LLM).

The tool integrates DeepL for initial translations and GPT-4 for evaluating and suggesting improvements. The decision to use DeepL was based on its “reliable output and promising results in translating scientific text,” which the authors said was essential for the accuracy of research questionnaires.

“We set out to develop a prototype of a questionnaire translation tool that would exploit the versatility of LLMs in natural language processing tasks to the benefit of researchers conducting cross-cultural studies,” they said.

Haavisto and Welsch tested the tool’s effectiveness through two online studies: one involving 10 participants testing the English-German language pair and another involving 20 participants testing the English-Portuguese language pair. Both studies showed “promising results regarding LLM adoption in the questionnaire translation process,” according to the authors.

Moderately Helpful

The study’s findings indicated that machine translation, when supplemented by GPT-4-generated quality scores, leads to translation quality and semantic similarity comparable to traditional translation. Participants also found the GPT-4-generated suggestions “moderately helpful” and accurate in representing translation quality.

Haavisto and Welsch also noted that LLM-generated translation quality evaluations can assist researchers in identifying and addressing context-specific issues in their translations, highlighting that “this is the first step towards more equitable questionnaire-based research, powered by AI.”

The tool currently supports translations in English, German, Portuguese, and Finnish — although Finnish remains untested. The code for the prototype is publicly available on GitHub, inviting further exploration and contributions from the community.

原⽂链接