Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the betterdocs domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /data/user/htdocs/wp-includes/functions.php on line 6114

Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the jnews-view-counter domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /data/user/htdocs/wp-includes/functions.php on line 6114

Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the wp-statistics domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /data/user/htdocs/wp-includes/functions.php on line 6114

Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the wpdiscuz domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /data/user/htdocs/wp-includes/functions.php on line 6114

Notice: 函数 _load_textdomain_just_in_time 的调用方法不正确jnews 域的翻译加载触发过早。这通常表示插件或主题中的某些代码运行过早。翻译应在 init 操作或之后加载。 请查阅调试 WordPress来获取更多信息。 (这个消息是在 6.7.0 版本添加的。) in /data/user/htdocs/wp-includes/functions.php on line 6114

Notice: 函数 _load_textdomain_just_in_time 的调用方法不正确jnews-like 域的翻译加载触发过早。这通常表示插件或主题中的某些代码运行过早。翻译应在 init 操作或之后加载。 请查阅调试 WordPress来获取更多信息。 (这个消息是在 6.7.0 版本添加的。) in /data/user/htdocs/wp-includes/functions.php on line 6114

Warning: Cannot modify header information - headers already sent by (output started at /data/user/htdocs/wp-includes/functions.php:6114) in /data/user/htdocs/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /data/user/htdocs/wp-includes/functions.php:6114) in /data/user/htdocs/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /data/user/htdocs/wp-includes/functions.php:6114) in /data/user/htdocs/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /data/user/htdocs/wp-includes/functions.php:6114) in /data/user/htdocs/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /data/user/htdocs/wp-includes/functions.php:6114) in /data/user/htdocs/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /data/user/htdocs/wp-includes/functions.php:6114) in /data/user/htdocs/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /data/user/htdocs/wp-includes/functions.php:6114) in /data/user/htdocs/wp-includes/rest-api/class-wp-rest-server.php on line 1893

Warning: Cannot modify header information - headers already sent by (output started at /data/user/htdocs/wp-includes/functions.php:6114) in /data/user/htdocs/wp-includes/rest-api/class-wp-rest-server.php on line 1893
{"id":23253,"date":"2023-12-06T00:58:08","date_gmt":"2023-12-05T16:58:08","guid":{"rendered":"https:\/\/linguaresources.com\/?p=23253"},"modified":"2024-07-18T21:46:18","modified_gmt":"2024-07-18T13:46:18","slug":"xcomet%ef%bc%9a%e7%bf%bb%e8%af%91%e8%b4%a8%e9%87%8f%e5%88%86%e6%9e%90%e7%9a%84%e6%96%b0%e9%a2%86%e5%9f%9f","status":"publish","type":"post","link":"https:\/\/linguaresources.com\/?p=23253","title":{"rendered":"XCOMET\uff1a\u7ffb\u8bd1\u8d28\u91cf\u5206\u6790\u7684\u65b0\u9886\u57df"},"content":{"rendered":"
\n
\n
\"\"\"\"<\/figure>\n<\/div>\n

Automatic metrics for machine translation evaluation have come a long way, with neural metrics like COMET (Rei et al. 2020)<\/a> and BLEURT (Sellam et al, 2020) <\/a>leading the charge in improving translation quality assessment. These metrics have shown significant advancements, particularly in their ability to correlate with human judgments, surpassing traditional metrics like BLEU (Papineni et al. 2002)<\/a>. However, these metrics, while powerful, have their limitations, as they provide only a single sentence-level score, leaving translation errors hidden beneath the surface.<\/p>\n

In an era where large language models (LLMs) have revolutionized natural language processing, researchers have started to employ them for more granular translation error assessment (Fernandes et al. 2023<\/a>, Kocmi et al. 2023)<\/a>. This involves not just evaluating the translation as a whole but also pinpointing and categorizing specific errors, providing a deeper and more insightful view into translation quality.<\/p>\n

Here is where XCOMET makes its grand entrance. XCOMET<\/strong> is a cutting-edge, open-source metric designed to bridge the gap between these two evaluation approaches. It brings together the best of both worlds by combining sentence-level evaluation and error span detection capabilities. The result? State-of-the-art performance across all types of evaluation, from sentence-level to system-level, while also highlighting and categorizing error spans, enriching the quality assessment process.<\/strong><\/p>\n

But what sets XCOMET apart from the rest? Here\u2019s a closer look at what makes it a game-changer:<\/p>\n

    \n
  1. Detailed Error Analysis:<\/strong> Unlike traditional metrics that offer just one score, XCOMET digs deeper by identifying and categorizing specific translation errors. This fine-grained approach provides a more comprehensive understanding of the quality of the translation.<\/li>\n
  2. Robust Performance:<\/strong> XCOMET has been rigorously tested and outperforms widely-used neural metrics and generative LLM-based approaches. It sets a new standard for evaluation metrics, demonstrating its superiority in all relevant evaluation vectors.<\/li>\n
  3. Robustness and Reliability:<\/strong> The XCOMET suite of metrics excels at identifying critical errors and hallucinations, making it a reliable choice for evaluating translation quality, even in challenging scenarios.<\/li>\n
  4. Versatility:<\/strong> XCOMET is a unified metric that accommodates all modes of evaluation, whether you have a reference, need quality estimation, or even when a source is not provided. This flexibility sets it apart and makes it an invaluable tool for translation evaluation.<\/li>\n<\/ol>\n

    How does XCOMET Compare with Auto-MQM and other Metrics?<\/h2>\n

    Let\u2019s delve into the results to see just how impressive XCOMET truly is. We\u2019ve conducted thorough evaluations, including a comparison with other widely known metrics, including recent LLM-based metrics. Take a look at these two tables:<\/p>\n

    \n
    \"\"\"\"<\/figure>\n<\/div>\n
    \n
    \"\"\"\"<\/figure>\n<\/div>\n

    These tables highlight the exceptional performance of XCOMET. In segment-level evaluations, XCOMET outshines other widely known metrics, including the recent LLM-based metrics such as GEMBA-GPT4<\/a>. When compared to AutoMQM based<\/a> on GPT-4 at the word level, XCOMET maintains its superiority, even when used without reference in a Quality Estimation scenario!<\/p>\n

    It\u2019s worth noting that AutoMQM based on GPT-4, while impressive at word level, relies on large and costly LLMs, limiting its accessibility and applicability. XCOMET, on the other hand, outperforms GPT-4 and thrives with cost-effective LLMs like GPT-3, making it a versatile choice for researchers and practitioners in the field.<\/p>\n

    To make XCOMET accessible to the community, we have released two evaluation models: XCOMET-XL, featuring 3.5 billion parameters, and XCOMET-XXL, with an impressive 10.7 billion parameters. These models are available through the COMET framework and the Hugging Face Model Hub:<\/p>\n