Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the betterdocs
domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init
action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /data/user/htdocs/wp-includes/functions.php on line 6114
Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the jnews-view-counter
domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init
action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /data/user/htdocs/wp-includes/functions.php on line 6114
Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the wp-statistics
domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init
action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /data/user/htdocs/wp-includes/functions.php on line 6114
Notice: Function _load_textdomain_just_in_time was called incorrectly. Translation loading for the wpdiscuz
domain was triggered too early. This is usually an indicator for some code in the plugin or theme running too early. Translations should be loaded at the init
action or later. Please see Debugging in WordPress for more information. (This message was added in version 6.7.0.) in /data/user/htdocs/wp-includes/functions.php on line 6114
Notice: 函数 _load_textdomain_just_in_time 的调用方法不正确。 jnews
域的翻译加载触发过早。这通常表示插件或主题中的某些代码运行过早。翻译应在 init
操作或之后加载。 请查阅调试 WordPress来获取更多信息。 (这个消息是在 6.7.0 版本添加的。) in /data/user/htdocs/wp-includes/functions.php on line 6114
Notice: 函数 _load_textdomain_just_in_time 的调用方法不正确。 jnews-like
域的翻译加载触发过早。这通常表示插件或主题中的某些代码运行过早。翻译应在 init
操作或之后加载。 请查阅调试 WordPress来获取更多信息。 (这个消息是在 6.7.0 版本添加的。) in /data/user/htdocs/wp-includes/functions.php on line 6114
Warning: Cannot modify header information - headers already sent by (output started at /data/user/htdocs/wp-includes/functions.php:6114) in /data/user/htdocs/wp-includes/rest-api/class-wp-rest-server.php on line 1893
Warning: Cannot modify header information - headers already sent by (output started at /data/user/htdocs/wp-includes/functions.php:6114) in /data/user/htdocs/wp-includes/rest-api/class-wp-rest-server.php on line 1893
Warning: Cannot modify header information - headers already sent by (output started at /data/user/htdocs/wp-includes/functions.php:6114) in /data/user/htdocs/wp-includes/rest-api/class-wp-rest-server.php on line 1893
Warning: Cannot modify header information - headers already sent by (output started at /data/user/htdocs/wp-includes/functions.php:6114) in /data/user/htdocs/wp-includes/rest-api/class-wp-rest-server.php on line 1893
Warning: Cannot modify header information - headers already sent by (output started at /data/user/htdocs/wp-includes/functions.php:6114) in /data/user/htdocs/wp-includes/rest-api/class-wp-rest-server.php on line 1893
Warning: Cannot modify header information - headers already sent by (output started at /data/user/htdocs/wp-includes/functions.php:6114) in /data/user/htdocs/wp-includes/rest-api/class-wp-rest-server.php on line 1893
Warning: Cannot modify header information - headers already sent by (output started at /data/user/htdocs/wp-includes/functions.php:6114) in /data/user/htdocs/wp-includes/rest-api/class-wp-rest-server.php on line 1893
Warning: Cannot modify header information - headers already sent by (output started at /data/user/htdocs/wp-includes/functions.php:6114) in /data/user/htdocs/wp-includes/rest-api/class-wp-rest-server.php on line 1893
{"id":2985,"date":"2023-06-18T10:46:58","date_gmt":"2023-06-18T02:46:58","guid":{"rendered":"https:\/\/linguaresources.com\/?p=2985"},"modified":"2023-06-18T11:55:42","modified_gmt":"2023-06-18T03:55:42","slug":"speechgen%ef%bc%9a%e7%94%a8%e6%8f%90%e7%a4%ba%e8%a7%a3%e9%94%81%e8%af%ad%e9%9f%b3%e8%af%ad%e8%a8%80%e6%a8%a1%e5%9e%8bspeech-lm%e7%9a%84%e7%94%9f%e6%88%90%e8%83%bd%e5%8a%9b","status":"publish","type":"post","link":"https:\/\/linguaresources.com\/?p=2985","title":{"rendered":"SpeechGen\uff1a\u7528\u63d0\u793a\u89e3\u9501\u8bed\u97f3\u8bed\u8a00\u6a21\u578b(Speech LM)\u7684\u751f\u6210\u80fd\u529b"},"content":{"rendered":"\n \n<\/p>\n
\n <\/strong><\/span><\/strong><\/em> <\/strong><\/span><\/strong><\/em><\/span>\u8bba\u6587\u94fe\u63a5\uff1a<\/span><\/strong>\n<\/p>\n\n https:\/\/arxiv.org\/pdf\/2306.02207.pdf<\/span>
\n<\/section>\n\n <\/strong><\/span><\/strong><\/em> <\/strong><\/span><\/strong><\/em><\/span>Demo:<\/span><\/strong>\n<\/p>\n\n https:\/\/ga642381.github.io\/SpeechPrompt\/speechgen.html<\/span>
\n<\/section>\n\n <\/strong><\/span><\/strong><\/em> <\/strong><\/span><\/strong><\/em><\/span>Code:<\/span><\/strong>\n<\/p>\n\n https:\/\/github.com\/ga642381\/SpeechGen<\/span>\n<\/p>\n\n \n<\/p>\n
\n \n<\/p>\n\n\n\n\n\n
\n <\/section>\n<\/section>\n\n\n\n \u5f15\u8a00\u4e0e\u52a8\u673a<\/strong><\/span><\/span>
\n <\/section>\n<\/section>\n<\/section>\n<\/section>\n<\/section>\n<\/section>\n\n \u5927\u578b\u8bed\u8a00\u6a21\u578b \uff08LLMs\uff09\u5728\u4eba\u5de5\u667a\u80fd\u751f\u6210\u5185\u5bb9\uff08AIGC\uff09\u65b9\u9762\u5f15\u8d77\u4e86\u76f8\u5f53\u5927\u7684\u5173\u6ce8\uff0c\u7279\u522b\u662f\u968f\u7740 ChatGPT \u7684\u51fa\u73b0\u3002<\/span>\n<\/p>\n\n
<\/span>\n<\/p>\n\n \u7136\u800c\uff0c\u5982\u4f55\u7528\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5904\u7406\u8fde\u7eed\u8bed\u97f3\u4ecd\u7136\u662f\u4e00\u4e2a\u672a\u89e3\u51b3\u7684\u6311\u6218\uff0c\u8fd9\u4e00\u6311\u6218\u963b\u788d\u4e86\u5927\u578b\u8bed\u8a00\u6a21\u578b\u5728\u8bed\u97f3\u751f\u6210\u65b9\u9762\u7684\u5e94\u7528\u3002<\/span>\n<\/p>\n\n
<\/span>\n<\/p>\n\n \u56e0\u4e3a\u8bed\u97f3\u4fe1\u53f7\u5305\u542b\u4e30\u5bcc\u7684\u4fe1\u606f\uff0c\u5305\u62ec\u8bf4\u8bdd\u8005\u548c\u60c5\u611f\uff0c\u8d85\u8d8a\u4e86\u7eaf\u6587\u672c\u6570\u636e\uff0c\u57fa\u4e8e\u8bed\u97f3\u7684\u8bed\u8a00\u6a21\u578b \uff08Speech Language Model, Speech LM\uff09\u4e0d\u65ad\u6d8c\u73b0\u3002<\/span>\n<\/p>\n\n
<\/span>\n<\/p>\n\n \u867d\u7136\u4e0e\u57fa\u4e8e\u6587\u672c\u7684\u8bed\u8a00\u6a21\u578b\u76f8\u6bd4\uff0c\u8bed\u97f3\u8bed\u8a00\u6a21\u578b\u4ecd\u5904\u4e8e\u65e9\u671f\u9636\u6bb5\uff0c\u4f46\u7531\u4e8e\u8bed\u97f3\u6570\u636e\u4e2d\u8574\u542b\u7740\u6bd4\u6587\u672c\u66f4\u4e30\u5bcc\u7684\u4fe1\u606f\uff0c\u5b83\u4eec\u5177\u5907\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u4ee4\u4eba\u5145\u6ee1\u671f\u5f85\u3002<\/span>\n<\/p>\n\n
<\/span>\n<\/p>\n\n \u7814\u7a76\u4eba\u5458\u4eec\u6b63\u79ef\u6781\u63a2\u7d22\u63d0\u793a \uff08prompt\uff09 \u8303\u5f0f\u7684\u6f5c\u529b\uff0c\u4ee5\u53d1\u6325\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\u7684\u80fd\u529b\u3002\u8fd9\u79cd\u63d0\u793a\u901a\u8fc7\u5fae\u8c03\u5c11\u91cf\u53c2\u6570\uff0c\u5f15\u5bfc\u9884\u8bad\u7ec3\u8bed\u8a00\u6a21\u578b\u505a\u7279\u5b9a\u7684\u4e0b\u6e38\u4efb\u52a1\u3002\u8fd9\u79cd\u6280\u672f\u56e0\u5176\u9ad8\u6548\u548c\u6709\u6548\u800c\u5728NLP\u9886\u57df\u5907\u53d7\u9752\u7750\u3002\u5728\u8bed\u97f3\u5904\u7406\u9886\u57df\uff0cSpeechPrompt\u5c55\u793a\u51fa\u4e86\u5728\u53c2\u6570\u6548\u7387\u65b9\u9762\u7684\u663e\u8457\u6539\u8fdb\uff0c\u5e76\u5728\u5404\u79cd\u8bed\u97f3\u5206\u7c7b\u4efb\u52a1\u4e2d\u53d6\u5f97\u4e86\u7ade\u4e89\u6027\u7684\u8868\u73b0\u3002<\/span>\n<\/p>\n\n
<\/span>\n<\/p>\n\n \u7136\u800c\uff0c\u63d0\u793a\u80fd\u5426\u5e2e\u52a9\u8bed\u97f3\u8bed\u8a00\u6a21\u578b\u505a\u751f\u6210\u4efb\u52a1\u4ecd\u662f\u672a\u89e3\u4e4b\u8c1c\u3002\u5728\u672c\u6587\u4e2d\uff0c\u6211\u4eec\u63d0\u51fa\u4e00\u4e2a\u521b\u65b0\u7684\u7edf\u4e00\u6846\u67b6\uff0cSpeechGen\uff0c\u65e8\u5728\u6fc0\u53d1\u8bed\u97f3\u8bed\u8a00\u6a21\u578b\u8fdb\u884c\u751f\u6210\u4efb\u52a1\u7684\u6f5c\u529b\u3002\u5982\u4e0b\u56fe\u6240\u793a\uff0c\u5c06\u4e00\u6bb5\u8bed\u97f3\u3001\u4e00\u4e2a\u7279\u5b9a\u7684\u63d0\u793a\uff08prompt\uff09\u5582\u7ed9 speech LM \u4f5c\u4e3a\u8f93\u5165\uff0cspeech LM\u5c31\u80fd\u505a\u7279\u5b9a\u7684\u4efb\u52a1\u3002\u6bd4\u5982\u5c06\u7ea2\u8272\u7684 prompt \u5f53\u4f5c\u8f93\u5165\uff0cspeech LM \u5c31\u80fd\u505a speech translation \u7684\u4efb\u52a1\u3002<\/span>\n<\/p>\n\n
<\/span>\n<\/p>\n\n \n<\/p>\n\n \u6211\u4eec\u63d0\u51fa\u7684\u6846\u67b6\u6709\u4ee5\u4e0b\u4f18\u70b9\uff1a<\/span>
\n<\/section>\n\n
<\/span>
\n<\/section>\n\n 1.<\/span>\u65e0\u6587\u672c (Textless)\uff1a<\/span><\/strong>\u6211\u4eec\u7684\u6846\u67b6\u4ee5\u53ca\u5176\u6240\u4f9d\u8d56\u7684\u8bed\u97f3\u8bed\u8a00\u6a21\u578b\u72ec\u7acb\u4e8e\u6587\u5b57\u6570\u636e\uff0c\u62e5\u6709\u65e0\u53ef\u4f30\u91cf\u7684\u4ef7\u503c\u3002\u6bd5\u7adf\uff0c\u83b7\u53d6\u6807\u8bb0\u6587\u672c\u4e0e\u8bed\u97f3\u914d\u5bf9\u7684\u8fc7\u7a0b\u8017\u65f6\u7e41\u7410\uff0c\u800c\u4e14\u5728\u67d0\u4e9b\u8bed\u8a00\u4e2d\u751a\u81f3\u65e0\u6cd5\u627e\u5230\u5408\u9002\u7684\u6587\u672c\u3002\u65e0\u9700\u6587\u5b57\u7684\u7279\u6027\u4f7f\u5f97\u6211\u4eec\u7684\u5f3a\u5927\u8bed\u97f3\u751f\u6210\u80fd\u529b\u5f97\u4ee5\u8986\u76d6\u5404\u79cd\u8bed\u8a00\u9700\u6c42\uff0c\u8ba9\u5168\u4eba\u7c7b\u53d7\u76ca\u532a\u6d45\u3002<\/span>
\n<\/section>\n\n
<\/span>
\n<\/section>\n\n 2.<\/span>\u591a\u529f\u80fd\u6027 (Versatility)\uff1a<\/span><\/strong>\u6211\u4eec\u5f00\u53d1\u7684\u6846\u67b6\u901a\u7528\u6027\u6781\u9ad8\uff0c\u80fd\u5e94\u7528\u4e8e\u5404\u79cd\u5404\u6837\u7684\u8bed\u97f3\u751f\u6210\u4efb\u52a1\u3002\u6587\u7ae0\u4e2d\u7684\u5b9e\u9a8c\u4f7f\u7528\u8bed\u97f3\u7ffb\u8bd1\u3001\u8bed\u97f3\u4fee\u590d\u3001\u8bed\u97f3\u8fde\u7eed\u5f53\u4f5c\u4f8b\u5b50\u3002 <\/span>
\n<\/section>\n\n
<\/span>
\n<\/section>\n\n 3.<\/span>\u7b80\u6613\u6027 (Easy to follow)\uff1a<\/span><\/strong>\u6211\u4eec\u63d0\u51fa\u7684\u6846\u67b6\u4e3a\u5404\u7c7b\u8bed\u97f3\u751f\u6210\u4efb\u52a1\u63d0\u4f9b\u4e86\u901a\u7528\u89e3\u51b3\u65b9\u6848\uff0c\u8ba9\u8bbe\u8ba1\u4e0b\u6e38\u6a21\u578b\u548c\u635f\u5931\u51fd\u6570\u53d8\u5f97\u8f7b\u800c\u6613\u4e3e\u3002<\/span>
\n<\/section>\n\n
<\/span>
\n<\/section>\n\n 4.<\/span>\u53ef\u8fc1\u79fb\u6027 (Transferability)\uff1a<\/span><\/strong>\u6211\u4eec\u7684\u6846\u67b6\u4e0d\u4ec5\u5bb9\u6613\u9002\u5e94\u672a\u6765\u66f4\u5148\u8fdb\u7684\u8bed\u97f3\u8bed\u8a00\u6a21\u578b\uff0c\u8fd8\u8574\u85cf\u7740\u5de8\u5927\u7684\u6f5c\u529b\uff0c\u8ba9\u6548\u7387\u548c\u6548\u679c\u5f97\u5230\u8fdb\u4e00\u6b65\u63d0\u5347\u3002\u5c24\u5176\u4ee4\u4eba\u632f\u594b\u7684\u662f\uff0c\u968f\u7740\u5148\u8fdb\u8bed\u97f3\u8bed\u8a00\u6a21\u578b\u5373\u5c06\u95ee\u4e16\uff0c\u6211\u4eec\u7684\u6846\u67b6\u5c06\u8fce\u6765\u66f4\u4e3a\u5f3a\u5927\u7684\u53d1\u5c55\u3002 <\/span>
\n<\/section>\n\n
<\/span>
\n<\/section>\n\n 5.<\/span>\u7ecf\u6d4e\u6027 (Affordability)\uff1a<\/span><\/strong>\u6211\u4eec\u7684\u6846\u67b6\u7ecf\u8fc7\u7cbe\u5fc3\u7684\u8bbe\u8ba1\uff0c\u53ea\u9700\u8bad\u7ec3\u5c11\u91cf\u53c2\u6570\uff0c\u800c\u4e0d\u662f\u6574\u4e2a\u5e9e\u5927\u7684\u8bed\u8a00\u6a21\u578b\u3002\u8fd9\u6781\u5927\u5730\u51cf\u8f7b\u4e86\u8ba1\u7b97\u8d1f\u62c5\uff0c\u5e76\u5141\u8bb8\u5728GTX 2080 GPU\u4e0a\u6267\u884c\u8bad\u7ec3\u8fc7\u7a0b\u3002\u5927\u5b66\u7684\u5b9e\u9a8c\u5ba4\u4e5f\u80fd\u8d1f\u62c5\u5f97\u8d77\u8fd9\u6837\u7684\u8fd0\u7b97\u5f00\u9500\u3002<\/span>
\n<\/section>\n\n \n<\/p>\n\n\n\n\n\n
\n <\/section>\n<\/section>\n\n\n\n SpeechGen<\/strong><\/span>
\n <\/section>\n<\/section>\n<\/section>\n<\/section>\n<\/section>\n<\/section>\n\n <\/span>
\n<\/section>\n\n \n<\/p>\n\n \u6211\u4eec\u7684\u7814\u7a76\u65b9\u6cd5\u5728\u4e8e\u6784\u5efa\u4e00\u4e2a\u5168\u65b0\u7684\u6846\u67b6 SpeechGen\uff0c\u8be5\u6846\u67b6\u4e3b\u8981\u7528\u4e8e\u5229\u7528\u8bed\u97f3\u8bed\u8a00\u6a21\u578b \uff08Speech Language Model, Speech LM\uff09\u8fdb\u884c\u5404\u79cd\u4e0b\u6e38\u8bed\u97f3\u751f\u6210\u4efb\u52a1\u7684\u5fae\u8c03\u3002\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\uff0cSpeech LMs\u7684\u53c2\u6570\u4fdd\u6301\u4e0d\u53d8\uff0c\u6211\u4eec\u7684\u65b9\u6cd5\u4fa7\u91cd\u4e8e\u5b66\u4e60\u4efb\u52a1\u7279\u5b9a\u7684\u63d0\u793a\uff08Prompt\uff09\u5411\u91cf\u3002Speech LMs\u901a\u8fc7\u540c\u65f6\u5bf9\u63d0\u793a\u5411\u91cf\u548c\u8f93\u5165\u5355\u5143\u8fdb\u884c\u6761\u4ef6\u8bbe\u7f6e\uff0c\u6709\u6548\u5730\u751f\u6210\u7279\u5b9a\u8bed\u97f3\u751f\u6210\u4efb\u52a1\u6240\u9700\u7684\u8f93\u51fa\u3002\u7136\u540e\uff0c\u8fd9\u4e9b\u79bb\u6563\u5355\u5143\u8f93\u51fa\u88ab\u8f93\u5165\u5230\u57fa\u4e8e\u5355\u5143\u7684\u8bed\u97f3\u5408\u6210\u5668\u4e2d\uff0c\u751f\u6210\u5bf9\u5e94\u7684\u6ce2\u5f62\u3002<\/span>
\n<\/section>\n\n
<\/span>
\n<\/section>\n\n \u6211\u4eec\u7684 SpeechGen \u6846\u67b6\u7531\u4e09\u4e2a\u5143\u7d20\u7ec4\u6210\uff1a\u8bed\u97f3\u7f16\u7801\u5668\u3001Speech LM \u548c\u8bed\u97f3\u89e3\u7801\u5668\uff08Speech Decoder\uff09\u3002\u9996\u5148\uff0c\u8bed\u97f3\u7f16\u7801\u5668\u5c06\u6ce2\u5f62\u4f5c\u4e3a\u8f93\u5165\uff0c\u5e76\u5c06\u5176\u8f6c\u6362\u4e3a\u7531\u6709\u9650\u8bcd\u6c47\u8868\u5bfc\u51fa\u7684\u5355\u4f4d\u5e8f\u5217\u3002\u4e3a\u4e86\u7f29\u77ed\u5e8f\u5217\u957f\u5ea6\uff0c\u4f1a\u79fb\u9664\u91cd\u590d\u7684\u8fde\u7eed\u5355\u4f4d\u4ee5\u751f\u6210\u538b\u7f29\u7684\u5355\u4f4d\u5e8f\u5217\u3002\u7136\u540e\uff0cSpeech LM \u4f5c\u4e3a\u5355\u4f4d\u5e8f\u5217\u7684\u8bed\u8a00\u6a21\u578b\uff0c\u901a\u8fc7\u9884\u6d4b\u524d\u4e00\u5355\u4f4d\u548c\u5355\u4f4d\u5e8f\u5217\u7684\u540e\u7eed\u5355\u4f4d\u6765\u4f18\u5316\u53ef\u80fd\u6027\u3002\u6211\u4eec\u5bf9 Speech LM \u8fdb\u884c\u63d0\u793a\u8c03\u6574\uff0c\u4ee5\u5f15\u5bfc\u5176\u6839\u636e\u4efb\u52a1\u751f\u6210\u9002\u5f53\u7684\u5355\u4f4d\u3002\u6700\u540e\uff0cSpeech LM\u751f\u6210\u7684\u6807\u8bb0\u7531\u8bed\u97f3\u89e3\u7801\u5668\u5904\u7406\uff0c\u5c06\u5176\u8f6c\u6362\u56de\u6ce2\u5f62\u3002\u5728\u6211\u4eec\u7684\u63d0\u793a\u8c03\u6574\u7b56\u7565\u4e2d\uff0c\u63d0\u793a\u5411\u91cf\u4f1a\u5728\u8f93\u5165\u5e8f\u5217\u7684\u5f00\u59cb\u5904\u63d2\u5165\uff0c\u8fd9\u5c06\u5f15\u5bfc Speech LMs \u5728\u751f\u6210\u8fc7\u7a0b\u4e2d\u7684\u65b9\u5411\u3002\u5177\u4f53\u63d2\u5165\u7684\u63d0\u793a\u6570\u91cf\uff0c\u5219\u53d6\u51b3\u4e8e Speech LMs \u7684\u67b6\u6784\u3002\u5728\u5e8f\u5217\u5230\u5e8f\u5217\u7684\u6a21\u578b\u4e2d\uff0c\u7f16\u7801\u5668\u8f93\u5165\u548c\u89e3\u7801\u5668\u8f93\u5165\u90fd\u4f1a\u52a0\u5165\u63d0\u793a\uff0c\u4f46\u5728\u53ea\u6709\u7f16\u7801\u5668\u6216\u53ea\u6709\u89e3\u7801\u5668\u7684\u67b6\u6784\u4e2d\uff0c\u53ea\u4f1a\u5728\u8f93\u5165\u5e8f\u5217\u524d\u9762\u6dfb\u52a0\u4e00\u4e2a\u63d0\u793a\u3002<\/span>
\n<\/section>\n\n
<\/span>
\n<\/section>\n\n \u5728\u5e8f\u5217\u5230\u5e8f\u5217\u7684 Speech LMs\uff08\u5982mBART\uff09\u4e2d\uff0c\u6211\u4eec\u91c7\u7528\u4e86\u81ea\u6211\u76d1\u7763\u5b66\u4e60\u6a21\u578b\uff08\u5982HuBERT\uff09\u6765\u5904\u7406\u8f93\u5165\u548c\u76ee\u6807\u8bed\u97f3\u3002\u8fd9\u6837\u505a\u53ef\u4ee5\u4e3a\u8f93\u5165\u751f\u6210\u79bb\u6563\u5355\u5143\uff0c\u5e76\u4e3a\u76ee\u6807\u751f\u6210\u5bf9\u5e94\u7684\u79bb\u6563\u5355\u5143\u3002\u6211\u4eec\u5728\u7f16\u7801\u5668\u548c\u89e3\u7801\u5668\u8f93\u5165\u7684\u524d\u9762\u90fd\u6dfb\u52a0\u4e86\u63d0\u793a\u5411\u91cf\uff0c\u4ee5\u6784\u9020\u8f93\u5165\u5e8f\u5217\u3002\u6b64\u5916\uff0c\u6211\u4eec\u8fd8\u901a\u8fc7\u66ff\u6362\u6ce8\u610f\u529b\u673a\u5236\u4e2d\u7684\u5173\u952e\u503c\u5bf9\uff0c\u4ee5\u8fdb\u4e00\u6b65\u589e\u5f3a\u63d0\u793a\u7684\u6307\u5bfc\u80fd\u529b\u3002<\/span>
\n<\/section>\n\n
<\/span>
\n<\/section>\n\n \u5728\u6a21\u578b\u8bad\u7ec3\u4e2d\uff0c\u6211\u4eec\u4ee5\u4ea4\u53c9\u71b5\u635f\u5931\u4f5c\u4e3a\u6240\u6709\u751f\u6210\u4efb\u52a1\u7684\u76ee\u6807\u51fd\u6570\uff0c\u901a\u8fc7\u6bd4\u8f83\u6a21\u578b\u7684\u9884\u6d4b\u7ed3\u679c\u548c\u76ee\u6807\u79bb\u6563\u5355\u5143\u6807\u7b7e\u6765\u8ba1\u7b97\u635f\u5931\u3002\u5728\u8fd9\u4e2a\u8fc7\u7a0b\u4e2d\uff0c\u63d0\u793a\u5411\u91cf\u662f\u6a21\u578b\u4e2d\u552f\u4e00\u9700\u8981\u8bad\u7ec3\u7684\u53c2\u6570\uff0c\u800cSpeech LMs\u7684\u53c2\u6570\u5728\u8bad\u7ec3\u8fc7\u7a0b\u4e2d\u4fdd\u6301\u4e0d\u53d8\uff0c\u8fd9\u786e\u4fdd\u4e86\u6a21\u578b\u884c\u4e3a\u7684\u4e00\u81f4\u6027\u3002\u6211\u4eec\u901a\u8fc7\u63d2\u5165\u63d0\u793a\u5411\u91cf\uff0c\u5f15\u5bfc Speech LMs \u4ece\u8f93\u5165\u4e2d\u63d0\u53d6\u4efb\u52a1\u7279\u5b9a\u4fe1\u606f\uff0c\u5e76\u63d0\u9ad8\u4ea7\u751f\u7b26\u5408\u7279\u5b9a\u8bed\u97f3\u751f\u6210\u4efb\u52a1\u7684\u8f93\u51fa\u7684\u53ef\u80fd\u6027\u3002\u8fd9\u79cd\u65b9\u6cd5\u5141\u8bb8\u6211\u4eec\u5fae\u8c03\u5e76\u8c03\u6574 Speech LMs \u7684\u884c\u4e3a\uff0c\u800c\u65e0\u9700\u4fee\u6539\u5176\u57fa\u7840\u53c2\u6570\u3002<\/span>
\n<\/section>\n\n
<\/span>
\n<\/section>\n\n \u603b\u7684\u6765\u8bf4\uff0c\u6211\u4eec\u7684\u7814\u7a76\u65b9\u6cd5\u57fa\u4e8e\u4e00\u79cd\u5168\u65b0\u7684\u6846\u67b6 SpeechGen\uff0c\u901a\u8fc7\u8bad\u7ec3\u63d0\u793a\u5411\u91cf\uff0c\u5f15\u5bfc\u6a21\u578b\u7684\u751f\u6210\u8fc7\u7a0b\uff0c\u5e76\u4f7f\u5176\u80fd\u6709\u6548\u5730\u4ea7\u751f\u7b26\u5408\u7279\u5b9a\u8bed\u97f3\u751f\u6210\u4efb\u52a1\u7684\u8f93\u51fa\u3002<\/span>
\n<\/section>\n\n \n<\/p>\n\n\n\n\n\n
\n <\/section>\n<\/section>\n\n\n\n \u5b9e \u9a8c<\/strong><\/span>
\n <\/section>\n<\/section>\n<\/section>\n<\/section>\n<\/section>\n<\/section>\n\n <\/span>
\n<\/section>\n\n \u6211\u4eec\u7684\u6846\u67b6\u53ef\u4ee5\u7528\u4e8e\u4efb\u610f\u7684 speech LM \u53ca\u5404\u7c7b\u751f\u6210\u4efb\u52a1\uff0c\u5177\u6709\u5f88\u597d\u7684\u6f5c\u529b\u3002\u5728\u6211\u4eec\u7684\u5b9e\u9a8c\u4e2d\uff0c\u7531\u4e8e VALL-E \u548c AudioLM \u4e0d\u662f\u5f00\u6e90\u7684\uff0c\u6211\u4eec\u9009\u62e9\u4f7f\u7528 Unit mBART \u4f5c\u4e3a speech LM \u8fdb\u884c\u6848\u4f8b\u7814\u7a76\u3002\u6211\u4eec\u7528\u8bed\u97f3\u7ffb\u8bd1 (speech translation)\u3001\u8bed\u97f3\u4fee\u590d (speech inpainting)\u3001\u8bed\u97f3\u8fde\u7eed (speech continuation) \u5f53\u4f5c\u4f8b\u5b50\uff0c\u6765\u5c55\u793a\u6211\u4eec\u7684\u6846\u67b6\u7684\u80fd\u529b\u3002\u8fd9\u4e09\u4e2a\u4efb\u52a1\u7684\u793a\u610f\u56fe\u5982\u4e0b\u56fe\u6240\u793a\u3002\u6240\u6709\u7684\u4efb\u52a1\u90fd\u662f\u8bed\u97f3\u8f93\u5165\uff0c\u8bed\u97f3\u8f93\u51fa\uff0c\u65e0\u9700\u6587\u672c\u7684\u5e2e\u52a9\u3002<\/span>
\n<\/section>\n\n
<\/span>
\n<\/section>\n\n
\n<\/section>\n\n \n<\/p>\n