\n\n \n \n \n \n \u7814\u7a76\u76ee\u7684<\/span><\/p>\n<\/td>\n | \n 1.\u00a0<\/span>\u6587\u7ae0\u8bd5\u56fe\u5206\u6790\u5728\u673a\u7ffb\u8bd1\u6587\u8d28\u91cf\u8bc4\u4f30\u8fc7\u7a0b\u4e2d\uff0c\u5bfc\u81f4\u8bc4\u4f30\u7ed3\u679c\u504f\u5411\u4e8e\u673a\u5668\u7ffb\u8bd1\u800c\u4e0d\u5229\u4e8e\u4eba\u5de5\u7ffb\u8bd1\u7684\u56e0\u7d20\uff1b<\/span><\/span><\/p>\n2.\u00a0<\/span>\u6587\u7ae0\u63d0\u4f9b\u4e86\u4e00\u4e9b\u5efa\u8bae\u4ee5\u6539\u8fdb\u8bc4\u4f30\u65b9\u6cd5\uff0c\u4ee5\u4fbf\u4e8e\u66f4\u516c\u5e73\u5730\u8bc4\u4f30\u673a\u7ffb\u4e0e\u4eba\u7ffb\u7684\u8bd1\u6587\u8d28\u91cf\u3002<\/span><\/span><\/p>\n\u201cThe present article intends to contribute to this debate by attempting to identify further aspects of current MT quality evaluation methodologies which may lead to overvaluing MT performance while undervaluing (professional) human trans<\/span><\/span>lation performance (this phenomenon will be called \u2018MT bias\u2019 in the\u00a0remainder of this article). Also, it will offer some Translation\u00a0 \u00a0 \u00a0 \u00a0Studies informed suggestions on how these methodologies could be further improved or debiased in order to arrive at a more\u00a0balanced picture of MT vs. human translation quality.\u201d<\/span><\/span><\/p>\n<\/td>\n<\/tr>\n\n\n \u7814\u7a76\u91cd\u8981\u6027<\/span><\/p>\n<\/td>\n | \u672c\u6587\u8ba4\u4e3a\uff0c\u65e0\u8bba\u662f\u4ece\u4e13\u4e1a\u7ffb\u8bd1\u8fd8\u662f\u4ece\u7ffb\u8bd1\u5b66\u7684\u89d2\u5ea6\u6765\u770b\uff0c\u9002\u5f53\u4e14\u516c\u5e73\u7684\u673a\u5668\u7ffb\u8bd1\u8d28\u91cf\u8bc4\u4f30\u65b9\u6cd5\u90fd\u662f\u81f3\u5173\u91cd\u8981\u7684\u3002\u8fd9\u4e2a\u95ee\u9898\u4e0d\u4ec5\u4ec5\u6709\u5173\u5b66\u672f\uff0c\u540c\u65f6\u6709\u7740\u73b0\u5b9e\u5f71\u54cd\uff1a\u7b2c\u4e00\uff0c\u5f71\u54cd\u5b66\u751f\u9009\u62e9\u7ffb\u8bd1\u8fd9\u4e00\u5b66\u79d1\uff0c\u53ef\u80fd\u5bfc\u81f4\u5b66\u8005\u6216\u4e13\u4e1a\u4eba\u58eb\u79bb\u5f00\u7ffb\u8bd1\u9886\u57df\uff1b\u7b2c\u4e8c\uff0c\u53ef\u80fd\u5bfc\u81f4\u5bf9\u7ffb\u8bd1\u5b66\u672f\u8d44\u52a9\u4ee5\u53ca\u4e13\u4e1a\u7ffb\u8bd1\u7387\u4e0b\u964d\u3002<\/span><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n\n\n\n\u00a0<\/section>\n\n\n\n\n02\u00a0 \u6587\u732e\u7efc\u8ff0\u53ca\u5176\u65b9\u6cd5\u5206\u6790<\/strong><\/section>\n<\/section>\n<\/section>\n<\/section>\n<\/section>\n<\/section>\n<\/section>\n<\/section>\n\n\n\n\n \n \n \n \n \n \n \n \n \n \u8c37\u6b4c<\/span><\/p>\n<\/td>\n | \u3010Google: Bridging the gap between human and machine\u00a0translation<\/span>\u3011<\/span><\/span><\/p>\n 1.\u00a0<\/span>\u6807\u51c6<\/span><\/strong>\uff1a0<\/span>\u5206\u6700\u4f4e6<\/span>\u5206\u6700\u9ad8\uff08\u8bd1\u6587\u7684\u610f\u4e49\u4e0e\u539f\u6587\u5b8c\u5168\u4e00\u81f4\uff0c\u8bed\u6cd5\u6b63\u786e\uff09<\/span><\/span><\/p>\n2.\u00a0<\/span>\u6587\u672c\u6765\u6e90<\/span><\/strong>\uff1a\u7ef4\u57fa\u767e\u79d1\u548c\u65b0\u95fb\u7c7b\u6587\u7ae0\u4e2d\u9009\u53d6\u76f8\u5bf9\u7b80\u5355\u5316\u548c\u5b64\u7acb\u7684\u53e5\u5b50<\/span><\/span><\/p>\n3.\u00a0<\/span>\u8bc4\u59d4<\/span><\/strong>\uff1a\u53ef\u4ee5\u6d41\u5229\u8fd0\u7528ST<\/span>\u548cTT(human raters who are fluent in both languages)<\/span><\/span><\/span><\/p>\n4.\u00a0<\/span>\u8c37\u6b4c\u5b9e\u9a8c\u7684\u7f3a\u9677<\/span><\/strong>\uff1a<\/span><\/span><\/p>\n\u2460\u6e90\u6587\u672c\u90fd\u53d6\u81ea\u4e00\u4e2a\u9886\u57df\uff0c\u8f83\u4e3a\u5355\u4e00\uff0c\u800c\u4e14\u8be5\u9886\u57df\u5e76\u975e\u4e13\u4e1a\u8bd1\u5458\u65e5\u5e38\u7ffb\u8bd1\u7684\u9886\u57df\uff1a\u201cthe texts used to measure MT performance were drawn from a domain which is not representative of the domains that professional human translators\u00a0usually translate on a daily basis\u201d<\/span>\uff1b<\/span><\/span><\/p>\n\u2461\u5448\u73b0\u7ed9\u8bc4\u59d4\u7684\u662f\u5355\u72ec\u7684\u53e5\u5b50\uff0c\u6ca1\u6709\u63d0\u4f9b\u4e0a\u4e0b\u6587\u8bed\u5883\uff1a\u201c<\/span>raters were presented with isolated sentences which they had to judge without taking the wider document context into account<\/span>\u201d\uff1b<\/span><\/span><\/p>\n\u2462\u5927\u90e8\u5206\u8bc4\u59d4\u4e0d\u662f\u4e13\u4e1a\u8bd1\u8005\uff1a\u201craters who were most likely not professional translators<\/span>\u201d\u3002<\/span><\/span><\/td>\n<\/tr>\n\n\n \n \n \n \n \n \n \n \n \n \u5fae\u8f6f<\/span><\/p>\n<\/td>\n | \u3010Microsoft: Parity between professional human and machine\u00a0 translation<\/span>\u3011<\/span><\/span><\/p>\n 1.\u00a0<\/span>\u6807\u51c6\uff1a<\/span><\/strong>0<\/span>\u5206\u6700\u4f4e100<\/span>\u5206\u6700\u9ad8\uff08\u8bd1\u6587\u662f\u5426\u51c6\u786e\u4f20\u8fbe\u4e86\u539f\u6587\u7684\u8bed\u4e49\u542b\u4e49\uff09\uff1b<\/span><\/span><\/p>\n2.\u00a0<\/span>\u53c2\u8003\u8bd1\u6587\u6765\u6e90\uff1a\u2460<\/span><\/strong>\u5fae\u8f6f3<\/span>\u4e2aNMT<\/span>\u7cfb\u7edf\uff1b\u24613<\/span>\u4e2a\u53c2\u8003\u7cfb\u7edf\uff1b\u2462Reference-HT<\/span>\uff1a\u2463Reference-PE<\/span>\uff1b\u2464HT from translators\u00a0for the WMT17<\/span>\uff1b<\/span><\/span><\/p>\n3.\u00a0<\/span>\u4e0e\u8c37\u6b4cNMT<\/span>\u7684\u533a\u522b\uff1a<\/span><\/strong>Google<\/span>\u7684\u7cfb\u7edf\u57fa\u4e8eRNN<\/span>\u67b6\u6784\uff0c\u800c\u5fae\u8f6f\u8bba\u6587\u4e2d\u62a5\u9053\u7684\u7cfb\u7edf\u57fa\u4e8eTransformer<\/span>\u67b6\u6784\uff1b<\/span><\/span><\/p>\n4.\u00a0<\/span>\u5fae\u8f6f\u5b9e\u9a8c\u7684\u7f3a\u9677<\/span><\/strong>\u2014\u2014\u6807\u51c6\u4e0d\u5408\u9002\uff1a<\/span><\/span><\/p>\n\u5fae\u8f6f\u8ba4\u4e3a\uff0c\u662f\u5426\u8fbe\u5230human parity<\/span>\u5c31\u662f\u8bc4\u4f30\u673a\u7ffb\u548c\u4eba\u7ffb\u8bd1\u6587\u7684\u8d28\u91cf\u5728\u7edf\u8ba1\u5b66\u4e0a\u662f\u5426\u5b58\u5728\u5927\u7684\u5dee\u5f02\uff1b\u8fd9\u4e00\u6807\u51c6\u4e2d\u53c2\u8003\u8bd1\u6587\u7684\u8d28\u91cf\u5f88\u5173\u952e\uff0c\u56e0\u4e3a\u5b83\u5c06\u4f5c\u4e3a\u4e13\u4e1a\u8bd1\u8005\u80fd\u529b\u4ee3\u8868\u4ee5\u53ca\u673a\u7ffb\u8bd1\u6587\u7684\u6bd4\u8f83\u6807\u51c6\u3002<\/span>\u00a0<\/span><\/span><\/span><\/span><\/p>\n\u201cThe decisive factors here are the quality of the reference translations, which are intended to be representative of\u00a0professional human translation and which are<\/span><\/span>\u00a0to serve as a\u00a0standard of comparison for MT quality, as well as the question of how the scores for human and machine translations were\u00a0obtained.\u201d<\/span><\/span><\/span><\/td>\n<\/tr>\n\n\n \n \n \n \n \n \n \n \n \u5bf9\u5fae\u8f6f\u5b9e\u9a8c\u65b9\u6cd5\u7684\u6279\u5224<\/span><\/p>\n<\/td>\n | \n 1.\u00a0<\/span>Toral<\/span>\u7b49\u4eba\u5f3a\u8c03\u4e86\u5fae\u8f6f\u8bc4\u6d4b\u6d3b\u52a8\u4e2d\u4f7f\u7528\u7684\u6e90\u6587\u672c\uff08\u53d6\u81ea\u4e8eWMT 2017<\/span>\uff09\u662f\u6709\u95ee\u9898\u7684\u201c<\/span>It turned out that half of\u00a0 the sentence pairs were originally written in English, then human-translated into Chinese and finally machine-translated back into English by Microsoft.\u201d<\/span><\/span><\/p>\n2.\u00a0<\/span>Toral<\/span>\u7b49\u4eba\u4e5f\u5bf9\u5fae\u8f6f\u627e\u6765\u7684\u53cc\u8bed\u8bc4\u59d4\u4e0d\u6ee1 (<\/span>\u8fd9\u4e00\u6279\u8bc4\u5bf9\u4e8e\u4e0a\u6587\u8ba8\u8bba\u7684\u8c37\u6b4cMT<\/span>\u8bc4\u4f30\u6d3b\u52a8\u540c\u6837\u6210\u7acb)<\/span>\uff1b\u8fd9\u4e9b\u8bc4\u59d4\u8bc4\u5206\u8005\u4e4b\u95f4\u7684\u4e00\u81f4\u6027\u8f83\u9ad8(<\/span>\u5373\u5176\u8d28\u91cf\u5224\u65ad\u7684\u53d8\u5f02\u6027\u8f83\u5c0f) <\/span>\uff1b<\/span><\/span><\/p>\n3.\u00a0<\/span>\u4ee5\u5fae\u8f6f\u516c\u53f8\u7684\u4eba\u5de5\u8bd1\u6587\u4f5c\u4e3aNMT<\/span>\u7cfb\u7edf\u7684\u6bd4\u8f83\u6807\u51c6\uff0c\u53d1\u73b0\u4e2d\u82f1\u6587\u4eba\u5de5\u8bd1\u6587\u5747\u5b58\u5728\u5927\u91cf\u8bed\u6cd5\u9519\u8bef\u548c\u8bef\u8bd1\uff0c\u8fd9\u5bfc\u81f4Toral<\/span>\u6000\u7591\u8fd9\u4e9b\u4eba\u5de5\u53c2\u8003\u7ffb\u8bd1\u53ef\u80fd\u662f\u7531\u975e\u4e13\u5bb6\u4e13\u4e1a\u8bd1\u8005\u5236\u4f5c\u7684\uff1b<\/span><\/span><\/p>\n4.\u00a0<\/span>Toral<\/span>\u7b49\u4eba\u6279\u8bc4\u5fae\u8f6f\u516c\u53f8\u7684\u8bc4\u4f30\u6d3b\u52a8\u4ec5\u63d0\u4f9b\u5355\u72ec\u53e5\u5b50\uff0c\u7f3a\u4e4f\u4e0a\u4e0b\u6587\u8bed\u5883\u3002<\/span><\/span><\/p>\n\u00a0\u6ce8\uff1a\u8865\u5145\u6587\u732e A Set of Recommendations for Assessing Human-Machine Parity in Language Translation\u201d (L\u00e4ubli et al. 2020)\u5728\u4e0a\u8ff0\u6587\u732e\u4e2d\uff0cL\u00e4ubli\u63d0\u51fa\u4ee5\u4e0b\u5efa\u8bae\uff1a( R1 )\u5e94\u9009\u62e9\u804c\u4e1a\u8bd1\u8005\u4f5c\u4e3a\u8bc4\u5206\u8005\uff1b( R2 )\u8bc4\u4ef7\u6d3b\u52a8\u5e94\u8be5\u8bc4\u4ef7\u5b8c\u6574\u7684\u6587\u6863\u800c\u4e0d\u662f\u5355\u72ec\u7684\u53e5\u5b50\uff1b( R3 )\u9664\u4e86\u5145\u5206\u6027\u5916\uff0c\u8fd8\u5e94\u8be5\u8bc4\u4f30\u6d41\u5229\u5ea6\uff1b( R4)\u53c2\u8003\u8bd1\u6587\u4e0d\u5e94\u56e0\u6d41\u7545\u6027,\u800c\u88ab\u5927\u91cf\u7f16\u8f91\uff1b( R5 )\u5728\u8bc4\u4f30\u6d3b\u52a8\u4e2d\u5e94\u4f7f\u7528\u539f\u59cbST\u3002<\/span><\/span><\/td>\n<\/tr>\n\n\n \n \n \n \n \n \n \n \n \n CUBBIT<\/span><\/p>\n\n<\/td>\n | \u3010CUBBIT: Human translation is not the upper bound of\u00a0 translation quality<\/span>\u3011 \u00a0<\/span><\/span><\/span><\/p>\n 1.\u00a0<\/span>\u6807\u51c6<\/span><\/strong>\uff1adocument level<\/span>\uff1b<\/span><\/span><\/p>\n2.\u00a0<\/span>\u8bc4\u59d4\uff1a<\/span><\/strong>native speaker=professional translators<\/span>\uff1b<\/span><\/span><\/p>\n3.\u00a0<\/span>\u6d41\u7a0b\uff1a<\/span><\/strong>\u8bd1\u6587\u8bc4\u4f30–<\/span>\u9519\u8bef\u5206\u6790–<\/span>\u7ffb\u8bd1\u56fe\u7075\u6d4b\u8bd5\uff1b<\/span><\/span><\/p>\n4.\u00a0<\/span>\u4e3b\u5f20\uff1a<\/span><\/strong>\u4eba\u5de5\u7ffb\u8bd1\u4e0d\u4e00\u5b9a\u662f\u7ffb\u8bd1\u8d28\u91cf\u7684\u4e0a\u754c\u201chuman translation is not necessarily an upper bound of translation quality<\/span>\u201d\uff1b<\/span>\u00a0<\/span><\/span><\/p>\n5.\u00a0<\/span>\u6700\u96be\u6279\u5224\u7684\u539f\u56e0\uff1a<\/span><\/strong>\u5305\u542b\u4e86L\u00e4ubli<\/span>\u63d0\u5230\u7684\u8bc4\u4f30\u5efa\u8bae\u00a0<\/span>\u201cIt is also the hardest paper to criticise for its methodology because\u00a0 it incorporates several of the recommendations by L\u00e4ubli et <\/span><\/span>al. (2020)\u201d<\/span><\/span><\/p>\n6.\u00a0<\/span> | | | | | |