
曹佳童 翻译技术教育与研究


2025年01月08日 08:01 


< 国产 LLM vs 国外 LLM >


      在翻译界,公认的LLM宠儿当数ChatGPT、Claude-3.5-Sonnet,还有开源界的MVP—Llama, 它们可能在多语言支持、深度推理、多模态能力上表现突出,但这些模型的训练语言以英语为主,且数据集英文网站数量庞大,所以在处理文学类文本汉译英时更容易受到文化真空、文化习惯差异的影响,与原语境存在偏差,必然也影响译文输出质量。



首先,我们选取三段话(原文一、原文二、原文三),考虑到文学文本的全面复杂性,这三段话分别为古诗、古语和散文,极具汉语特色,选自李贺《苦昼短》、曹雪琴《红楼梦》和茅盾《白杨礼赞》,并找出质量较高的人工译文作为 BLEU 评估的参考版本。


编写具体的 prompt,分别放入国内大语言模型(文心一言、豆包、智谱清言)和国外大语言模型(ChatGPT、Claude-3.5-Sonnet、Llama-3.1-70b-instruct)进行翻译,产生六种译文版本。


将六种候选译文进行BLEU数值评估(方法:Python 运行代码或ChatGPT 里面的 Data Analyst)
【BLEU(Bilingual Evaluation Understudy)值】:衡量机器翻译质量的重要指标,通过对比机器翻译结果与人工参考译文,计算两者之间的相似度。BLEU 值取值范围在0-1之间,0意味着机器翻译与参考译文毫无重合,译文质量极低;1则表示译文与参考译文完全一致,质量极高。BLEU 值尽管存在局限性,但其参考价值不容小觑,它提供了快速、量化的评判基准,能迅速筛出译文质量差距明显的结果。



 BLEU 值评析 







考虑到模型训练语言的不同,为最大程度增进模型对于原文的理解,减少由于提示词导致的理解偏差,这里国产和国外 LLM 分别选用中文和英文提示词,且为了更加准确测试模型对中文原文的分析能力,提示词中不提供白话译文。原文一的中英文 prompt如下:


【英文 Prompt】
### Suppose you are an expert in literal translation. Please translate the following poem into English.
Source Target:
### Background:
This text is excerpted from a Chinese poem called Ku Zhou Duan, written by Li He, a prominent poet of the Tang Dynasty. During the Tang era, the society, despite its outward prosperity, was fraught with underlying anxieties. The imperial court had its share of political intrigues, and the literati often felt a sense of powerlessness in the face of the vast imperial bureaucracy.
### Context:
Li He himself led a rather frustrated life. He was highly talented but struggled to achieve a smooth official career. His life was marked by poverty and illness, constantly on the margins of the mainstream success that many of his peers seemed to chase. This backdrop of personal hardship combined with the general unease of the era seeped into his creative process.
### Target Audience:
The intended audience is English speakers who like the Chinese literature.
### Tone:
The desired tone for the translation should be literal and elegant, focusing on transmitting information effectively rather than using some complicated words. The translation should imitate the style of English poem.
### Style:
Preserve the idiomatic style of the original text without translationese.
### Specific Instructions:
Ensure that idiomatic expressions and cultural references are accurately conveyed. The translation should read naturally in English, resembling the work of an English author.

将中英文prompt分别输入国产和国外 LLM后,我们得到六种版本的译文:

Flying time! Flying time! Wait a minute and grab a wine.
I know not the height of heaven and the depth of earth,
I only see night alternate with day, cold with hot,  
Eating away brief human life.
Those who eat dainties grow fat,
While coarse food results in skinny cat.
Fleeting light, fleeting light, I toast thee with a cup of wine.
I know not the height of the azure sky, nor the depth of the yellow earth.
I see only the chill of the moon and the warmth of the sun,
Boiling away human life.
To feast on bears brings obesity, while frogs lead to leanness.
Oh, fleeting light, fleeting light,
Let me offer you a cup of wine.
I know not how high the blue sky soars,
Nor how thick the yellow earth lies.
All I see is the cold moon and warm sun,
Stewing away the span of human life.
To eat bear makes one fat, to eat frog makes one thin,
Yet time still slips away unceasingly.
Fleeting light, oh fleeting light, let me raise a cup to you.
I know not the height of the azure sky, nor the depth of the golden earth.
Only see the cold moon and the warm sun, toasting our fleeting life.
Feasting on bear makes one plump, while dining on frogs, one grows thin.
Swift light, swift light, I bid you drink this cup of wine. 
I know not the heights of the blue heavens, nor the depths of the yellow earth divine.  
I only see the cold moon and the warm sun,  
Relentlessly burning away the years of life, one by one.  
Feast on bear’s flesh, and your body grows stout,  
Feast on frogs, and thin you’ll waste about.  
Fleeting light, fleeting light,
Let me urge you with a cup of wine tonight.
I know not how high the azure heavens soar,
Nor how deep the yellow earth’s core.
I only see the cold moon and warm sun
Taking turns to wear our lives undone.
Those who feast on bears grow stout,
Those who eat frogs waste about.
Swiftly fleeting, swiftly fleeting, let me offer you a cup of wine.
I am unaware of the vastness of the blue sky and the thickness of the yellow earth.
I only see the moon’s chill and the sun’s warmth, which together brew the elixir of human life.
Eating bear makes one fat, eating frog makes one thin.

我们将人工译文及候选译文放入ChatGPT的 Data Analyst中 ,让其模仿 Python 运行代码分别计算出六组BLEU数值,之后基于 BLEU 值对六种译文进行评估,运行结果如下:(BLEU 值取值范围在0-1之间,0意味着机器翻译与参考译文毫无重合,译文质量极低)


  1. 译文一(文心一言):0.148

2. 译文二(豆包):0.055

3. 译文三(智谱清言):0.122

4. 译文四(ChatGPT):0.083

5. 译文五(Claude-3.5-Sonnet):≈0

6. 译文六(Llama):≈0


国产(黑)与国外(黄)LLM 在汉译英文本中的 BLEU 译文评估数值对比(一)图片

本诗翻译难度较大,纵向来看,得分整体较低。“飞光”、“月寒日暖”、“食熊则肥,食蛙则瘦”等真正含义被隐化,在翻译时需要先进行阐释再将其显化出来,比如“飞光”指的是飞逝的时间,而非“fleeting light”(六款模型都是只进行了语言文字的转换);“月寒日暖”并非指的是寒暖,而强调的是日月更替;最后一句“食熊则肥,食蛙则瘦”把“熊”、“蛙”直译出来外国读者能否get到作者的点,这也体现出大语言模型相对于人工翻译的局限。

但横向对比来看,BLEU 值最高译文产自文心一言,其次是智谱清言,ChatGPT 排名第三,而 Claude 和 Llama 译文的 BLEU 值趋近于零,前三种国产大语言模型的平均值明显高于国外的几款热门大语言模型。接下来我们看一下它们在古语理解方面的表现。



中英文 prompt 参考原文一,由于文本过长,不再列举,将prompt输入后,我们得到六种版本的译文:


At that time, Daiyu was taking a nap in bed. All slave girls had gone off about their affairs, making the whole room fall silent. Baoyu lifted the embroidered door curtain and entered the inner chamber. Daiyu slept there. He quickly went up and pushed her and said, “My coz, you just ate! You always slept.”


At that moment, Daiyu was resting on the bed for her afternoon nap, while all the maidservants had stepped out to attend to their own affairs, leaving the room in a profound silence. Baoyu lifted the embroidered silk curtain and entered the inner chamber, only to find Daiyu slumbering there. He quickly approached and gently nudged her, saying, “My dear sister, you’ve just eaten and now you’re sleeping again.”


In the afternoon, Daiyu was taking a nap on the bed. The maids had all gone out to amuse themselves, leaving the room in complete silence. Baoyu lifted the embroidered silk curtain and entered the inner room. There he saw Daiyu fast asleep. He quickly went over and nudged her, saying, “My dear sister, you’ve just had your meal and now you’re sleeping again.”


At that moment, Daiyu was resting on the bed during her afternoon break, with all the maidservants out for their leisure, leaving the room in complete silence. Baoyu lifted the embroidered silk curtain and entered the inner chamber, where he found Daiyu asleep. He hurried over and shook her awake, saying with a playful scold, “Dear sister, you’ve just had your meal and now you’re off to sleep again.”


At that moment, Daiyu was resting on her bed for a midday nap. The maids had all gone out to attend to their own affairs, leaving the room utterly silent. Baoyu lifted the soft embroidered curtain and entered the inner chamber. Seeing Daiyu lying there asleep, he hurried over, nudged her, and said, “Dear sister, you’ve just eaten, and now you’re sleeping again.”


At that time, Dai-yu was taking her midday rest on the bed, and her maids had all gone out to attend to their own affairs, leaving the room in complete silence. Bao-yu lifted the embroidered silk curtain and entered the inner chamber, where he found Dai-yu sleeping. He hurriedly went up to her and shook her, saying, “Dear sister, how can you be sleeping again when you’ve just eaten?”


“At that time, Daiyu was resting on the bed, taking a noon nap. The maid servants had all gone out to attend to their own affairs, and the entire room was quiet and still. Bao Yu lifted the embroidered soft curtain and entered the inner room, where he saw Daiyu sleeping. He hurried over to push her, saying: ‘Good sister, you just ate lunch and now you’re sleeping again.'”


Data Analyst 的 BLEU 数值运行结果如下:

1. 译文一(文心一言):0.129


2. 译文二(豆包):0.181


3. 译文三(智谱清言):0.114


4. 译文四(ChatGPT):0.125


5. 译文五(Claude-3.5-Sonnet):0.079


6. 译文六(Llama):0.101

国产(黑)与国外(黄)LLM 在汉译英文本中的 BLEU 译文评估数值对比(二)图片

同样,黑色表示训练语言为中文的大语言模型,黄色代表训练语言为英语的大语言模型。第二段话六种译文BLEU 数值较为平均,其中跟人工译文匹配最高的译文出自豆包,接着是文心一言、ChatGPT。



将中英文prompt分别输入国产和国外 LLM后,我们得到六种版本的译文:


They are white poplar. Though very common in Northwest China, they are no ordinary tree! You may call them unattractive because they have neither the graceful carriage of a dancer, nor such branches as can twine and climb. But they are big and tall, honest and upright, simple and plain, earnest and unyielding—and not without gentleness and warmth though. They are giants among trees!
This is the Populus alba, an extremely common tree in the northwest, yet it is anything but ordinary! It lacks the graceful posture and the twisted, winding branches. Perhaps you might say it’s not beautiful — if beauty is defined solely by terms like “graceful” or “irregularly spreading.” In that case, the Populus alba cannot be considered a beauty among trees; yet, it is magnificent, upright, unpretentious, solemn, and not devoid of warmth. Moreover, its unyielding strength and stiffness need not be mentioned. It is the majestic man among trees!
This is the white poplar, an extremely common tree in the northwest, yet by no means an ordinary one! It doesn’t have a graceful, swaying posture, nor does it possess gnarled, coiling branches. Perhaps you’d say it isn’t beautiful – if beauty is strictly defined as “swaying gracefully” or “growing in a slanting, wayward manner”, then the white poplar can hardly be counted as a beauty among trees. However, it is stalwart, upright, simple, solemn, and not lacking in gentleness. Not to mention its unwavering strength and its tall, straight stature. It is truly the great hero among trees!
This is the poplar tree, an extremely common sight in the northwest, yet by no means an ordinary tree! It lacks the graceful posture and the coiling, twisting branches that one might associate with beauty—if beauty is to be defined solely by such characteristics as “graceful” or “rambling with abandon.” In that case, the poplar might not be deemed a fair maid among trees. But it stands tall and upright, simple and solemn, not without gentleness, and its unyielding strength and towering presence are beyond words. It is the noble patriarch among trees!
This is the white poplar, a tree most common in the northwest, yet by no means an ordinary one! It lacks the graceful form of swaying foliage, nor does it boast the gnarled, winding branches that twist and turn. Perhaps you would say it is not beautiful—if beauty is solely defined by “swaying elegance” or “leaning gracefully in unrestrained poise.” In that case, the white poplar cannot be counted as the fair maiden among trees. But it is grand, upright, unpretentious, solemn, and not without gentleness. And let us not even begin to speak of its strength, resilience, and towering stature—it is truly the great hero among trees!
This is the white poplar, a common tree in the Northwest, yet by no means an ordinary one! It possesses neither swaying elegance nor twisted, coiling branches. Perhaps you might say it lacks beauty — if beauty solely refers to “graceful swaying” or “extending sideways with abandon,” then the white poplar would not qualify as a fair maiden among trees. However, it is towering, upright, plain, and solemn, not without gentleness, not to mention its unyielding strength and straight posture — it is truly a great gentleman among trees!
This is the white poplar, an ordinary tree in the northwest, yet by no means an ordinary tree! It doesn’t have a slender figure, nor does it have twisted and curved branches. Perhaps you’d say it’s not beautiful – if beauty refers specifically to being ‘slender’ or ‘luxuriantly overhanging’. In that case, the white poplar can’t be considered a good-looking tree. However, it is majestic, upright, unadorned, and solemn, and it’s not lacking in gentleness either. Not to mention its unyielding resilience and towering stature – it’s a great hero among trees!”

Data Analyst 的 BLEU 数值运行结果如下:

1. 译文一(文心一言):≈0

2. 译文二(豆包):≈0

3. 译文三(智谱清言):0.047

4. 译文四(ChatGPT):≈0

5. 译文五(Claude-3.5-Sonnet):≈0

6. 译文六(Llama):≈0

国产(黑)与国外(黄)LLM 在汉译英文本中的 BLEU 译文评估数值对比(三)



     BLEU 指标存在局限性:尽管 BLEU 值能定量反映翻译质量,但它无法衡量译文的语义深度、文化传达和整体文体风格;实验规模有限,需进一步验证,本文仅以三段中文文学文本作为测试样本,规模较小,因此,实验结果只能提供一定参考。
但是单从数据来看,国产模型(如文心一言、豆包、智谱清言)的 BLEU 值普遍高于国外模型(如 ChatGPT、Claude、Llama)。以 BLEU 作为标准,国产模型在处理中文文学文本的汉译英任务中表现较为优异。




您的邮箱地址不会被公开。 必填项已用 * 标注