2024年重磅发布的AI模型大盘点！

OpenAI, Meta, Google, and Anthropic introduced major artificial intelligence (AI) model updates this year, which primarily focused on three key trends:

OpenAI、Meta、Google 和 Anthropic 今年推出了拥有重大更新的人工智能 (AI) 模型，主要聚焦于三个关键趋势：

higher-performing multi-modal large language models (LLMs), combining video, audio, and text;

多模态大型语言模型（LLM）的性能越来越更高，可结合视频、音频和文本；

the growth of AI agents; and

AI 智能体日渐增多；以及

the rise of small language models (SLMs) that are cheaper to run and more task-specific.

小型语言模型（SLM）的兴起，这些模型运行成本更低且更专注于特定任务。

OpenAI

Sora: Sora was the first major text-to-video model launch capable of generating realistic videos up to a minute long from textual descriptions. (February)

Sora： Sora 是首个重要的文本转视频模型，能够根据文本描述生成长达一分钟的逼真视频。（二月）

GPT-4o: GPT-4omni is a model with voice and computer-vision capabilities, transforming ChatGPT into a virtual assistant with functionalities like real-time language translation/interpreting and tone modulation. (May)

GPT-4o： GPT-4omni 是一个具备语音和计算机视觉功能的模型，将 ChatGPT 转变为具有实时语言翻译/口译和语调调节等功能的虚拟助手。（五月）

o1 Model: The o1 model focuses on reasoning-heavy tasks by leveraging a technique called chain-of-thought (CoT) reasoning. (September)

o1 模型： o1 模型通过利用一种称为链式思维 (CoT) 推理的技术，专注于推理密集型任务。（九月）

OpenAI’s Sora was an impressive launch but never became publicly available, leaving many questions about its effectiveness. GPT-4omni, meanwhile, changed the game for me — I now have full conversations with GPT as I drive in the car or tackle random questions on the go. Its on-demand speech translation is also handy enough in a pinch.

OpenAI 发布的 Sora 令人印象深刻，但从未公开提供，因此对其有效性还存在许多疑问。与此同时，GPT-4omni 改变了作者的使用情况——作者现在可以在开车或随时解决任何问题时与 GPT 进行完整的对话。它的按需语音翻译在必要时也非常方便使用。

Meta

Llama 3 Series: Meta released the open-sourced Llama 3 series, including versions 3.1 and 3.2. Llama 3.2 introduced vision-capable LLMs and lightweight text-only models designed for edge and mobile devices. (July and September)

Llama 3 系列： Meta 发布了开源的 Llama 3 系列，包括 3.1 版和 3.2 版Llama 3.2 引入了具有视觉能力的 LLM 和专为边缘设备和移动设备设计的轻量级文本模型。（七月和九月）

Meta Movie Gen: Meta unveiled an AI video-generating tool capable of creating videos up to 16 seconds long from text prompts, outperforming competitors like OpenAI’s Sora and Google’s Veo. (October)

Meta Movie Gen： Meta 推出了一款 AI 视频生成工具，能根据文本提示创建长达 16 秒的视频，超越了 OpenAI 的 Sora 和谷歌的 Veo 等竞争对手。（十月）

Llama is a favorite amongst model builders due to cost savings. However, I’m concerned about data privacy; if our Facebook data has been used to train these models, what are the implications of turning that around as open-source?

Llama 因成本低廉而成为模型构建器的热门之选。但作者对数据隐私心存担忧；如果我们的 Facebook 数据被用来训练这些模型，那么将其开源会有什么影响？

Google

Gemini 1.5 Series: These models offer faster output speeds and lower latency that allows for more affordable application development. However, Google’s Gemini series faced significant backlash when its image generation feature produced historically inaccurate and offensive depictions, causing it to be pulled off the market for a period of time. (February)

Gemini 1.5 系列： 这个系列的输出速度更快，延迟更低，可以实现更经济的应用开发。但 Google 的 Gemini 系列的图像生成功能出现了历史不准确和冒犯性描述的问题，遭受了重大抨击，导致其被暂时下架。（二月）

AlphaChip AI: Google DeepMind announced AlphaChip, an AI-driven method for designing electronic chip layouts, marking a significant advancement in AI-assisted hardware design. (September)

AlphaChip AI: Google DeepMind 发布了 AlphaChip，这是一种基于 AI 的电子芯片布局设计方法，标志着 AI 辅助硬件设计取得重大进展。（九月）

Remember when Google’s Gemini was called Bard? That change happened this year, thankfully. Google struggled to regain credibility after the backlash over flaws in its image generation, such as depicting historical figures incorrectly (such as George Washington as African American). The controversy brought to the surface the influence model creators actually have that can significantly affect output, with diversity, equity, and inclusion (DEI) policies or internal biases leading to distorted results.

还记得 Google 的 Gemini 曾被称为 Bard 吗？幸运的是，这一变化在今年发生了。在 Gemini 因其图像生成功能缺陷（例如将乔治·华盛顿描绘为非洲裔美国人）而遭受抨击后，Google 艰难地恢复了公信力。这场争议揭示出模型创建者实际上能对模型的输出产生重大影响，导致多样性、公平性和包容性 (DEI) 政策或内部偏见会产生错误的的结果。

Anthropic

Claude 3.5: Anthropic’s most popular launch was Claude 3.5 Sonnet, an advanced model that excels at understanding nuanced instructions and generating sophisticated analyses. It features a Prompt Playground for developers to efficiently create and refine prompts for AI application development. (June)

Claude 3.5： Claude 3.5 Sonnet 是 Anthropic 最受欢迎的产品，这种高级模型擅长理解细微变化的指令并生成复杂的分析。它为开发人员提供了一个 Prompt Playground，用于高效地创建和完善 AI 应用开发的提示。（六月）

Computer Use Capability: Anthropic launched a “computer use” feature, enabling AI to essentially control your computer and perform tasks similar to human computer interactions (such as moving cursors, typing, and browsing the internet). This feature has been adopted by companies like Asana, Canva, and DoorDash. (October)

计算机使用能力： Anthropic 推出了“计算机使用”（computer use）功能，使 AI 能够基本上控制您的计算机，并执行类似于人机交互的任务（例如移动光标、输入文本和浏览互联网）。这一功能已被 Asana、Canva 和 DoorDash 等公司采用。（十月）

The word on the street from some of the underdogs I’ve spoken to is that Claude 3.5 Sonnet is becoming more favored in AI translation pipelines. The computer-use feature launch signaled the next era of AI capabilities.

作者与一些小众公司交谈时听到的街头传闻是，Claude 3.5 Sonnet 在 AI 翻译综合解决方案中越来越受欢迎。计算机使用功能的推出标志着 AI 能力新时代的到来。

特别说明：本文内容选自Multilingual官网，仅供学习交流使用，如有侵权请后台联系小编删除。

添加助教

即可进入本号粉丝群

获取最新资讯～

– END –

Was it helpful ?

还有问题？我们能帮忙吗？