


Everyone in AI is talking about Manus. We put it to the test.

Since the general AI agent Manus was launched last week, it has spread online like wildfire. And not just in China, where it was developed by the Wuhan-based startup Butterfly Effect. It’s made its way into the global conversation, with influential voices in tech, including Twitter cofounder Jack Dorsey and Hugging Face product lead Victor Mustar, praising its performance. Some have even dubbed it “the second DeepSeek,” comparing it to the earlier AI model that took the industry by surprise for its unexpected capabilities as well as its origin.

自通用智能体Manus上周发布以来,它便以燎原之势席卷网络。不仅在其研发地中国(由武汉的初创公司“蝴蝶效应”开发),它还引发了全球科技界的关注。推特(Twitter)联合创始人杰克·多西(Jack Dorsey)和抱脸网(Hugging Face)产品负责人维克多·穆斯塔法(Victor Mustar)等科技界知名人士都对它的表现赞不绝口。有人甚至将Manus称为“第二个DeepSeek”,将其与这一早前发布的人工智能模型相提并论,后者因其出人意料的能力和公司背景而令业界大吃一惊。Manus claims to be the world’s first general AI agent, leveraging multiple AI models (such as Anthropic’s Claude 3.5 Sonnet and fine-tuned versions of Alibaba’s open-source Qwen) and various independently operating agents to act autonomously on a wide range of tasks. (This makes it different from AI chatbots, including DeepSeek, which are based on a single large language model family and are primarily designed for conversational interactions.)
Manus自称是全球首个通用智能体,整合了多个AI模型(如Anthropic的Claude 3.5 Sonnet、阿里巴巴开源模型Qwen的微调版本)及许多能独立自主处理多任务的操作体。(与基于单一语言模型的聊天机器人如DeepSeek不同,后者主要用于对话交互)。
Despite all the hype, very few people have had a chance to use it. Currently, under 1% of the users on the wait list have received an invite code. (It’s unclear how many people are on this list, but for a sense of how much interest there is, Manus’s Discord channel has more than 186,000 members.)
MIT Technology Review was able to obtain access to Manus, and when I gave it a test-drive, I found that using it feels like collaborating with a highly intelligent and efficient intern: While it occasionally lacks understanding of what it’s being asked to do, makes incorrect assumptions, or cuts corners to expedite tasks, it explains its reasoning clearly, is remarkably adaptable, and can improve substantially when provided with detailed instructions or feedback. Ultimately, it’s promising but not perfect.
《麻省理工科技评论》(MIT Technology Review)获得了测试资格。运行测试时,研究员感觉像是在和一位高智商高效率实习生合作:偶尔会误解指令、做出错误假设或为赶工偷懒,但能清晰解释逻辑、快速适应需求,并在收到详细反馈后显著改进。总体而言,它潜力十足但尚未完美。
To put it to the test, I gave Manus two assignments: (1) compile a list of notable reporters covering China tech, (2) search for two-bedroom property listings in New York City.
为验证其实力,研究员为Manus布置了两项任务:(1) 整理中国科技领域知名记者名单,(2) 搜索纽约市两居室房源。

Here’s how it did:具体如下:

Task 1: The first list of reporters that Manus gave me contained only five names, with five “honorable mentions” below them. I noticed that it listed some journalists’ notable work but didn’t do this for others. I asked Manus why. The reason it offered was hilariously simple: It got lazy. It was “partly due to time constraints as I tried to expedite the research process,” the agent told me. When I insisted on consistency and thoroughness, Manus responded with a comprehensive list of 30 journalists, noting their current outlet and listing notable work. (I was glad to see I made the cut, along with many of my beloved peers.)

任务1: Manus第一次给出的名单列了5位记者,和5位“优秀记者”。研究员注意到它列出了部分记者的著名作品,但其他记者的代表作信息缺失。询问原因时,它的回答令人哭笑不得:因为懒。智能体答道:“由于时间限制,所以我加快了调研流程。”当研究员坚持要求一致性和完整性后,它迅速提交了一份包含30名记者的详细清单,标注了所属媒体及代表作。(研究员本人及多位同行荣幸上榜)
I was impressed that I was able to make top-level suggestions for changes, much as someone would with a real-life intern or assistant, and that it responded appropriately. And while it initially overlooked changes in some journalists’ employer status, when I asked it to revisit some results, it quickly corrected them. Another nice feature: The output was downloadable as a Word or Excel file, making it easy to edit or share with others.

Manus hit a snag, though, when accessing journalists’ news articles behind paywalls; it frequently encountered captcha blocks. Since I was able to follow along step by step, I could easily take over to complete these, though many media sites still blocked the tool, citing suspicious activity. I see potential for major improvements here—and it would be useful if a future version of Manus could proactively ask for help when it encounters these sorts of restrictions.

02任务2Task 2: For the apartment search, I gave Manus a complex set of criteria, including a budget and several parameters: a spacious kitchen, outdoor space, access to downtown Manhattan, and a major train station within a seven-minute walk. Manus initially interpreted vague requirements like “some kind of outdoor space” too literally, completely excluding properties without a private terrace or balcony access. However, after more guidance and clarification, it was able to compile a broader and more helpful list, giving recommendations in tiers and neat bullet points.

The final output felt straight from Wirecutter, containing subtitles like “best overall,” “best value,” and “luxury option.” This task (including the back-and-forth) took less than half an hour—a lot less time than compiling the list of journalists (which took a little over an hour), likely because property listings are more openly available and well-structured online.

Still, it’s not all smooth sailing. Manus can suffer from frequent crashes and system instability, and it may struggle when asked to process large chunks of text. The message “Due to the current high service load, tasks cannot be created. Please try again in a few minutes” flashed on my screen a few times when I tried to start new requests, and occasionally Manus’s Computer froze on a certain page for a long period of time.

不过,一切也并非一帆风顺。Manus目前存在明显短板:频繁崩溃、系统卡顿、处理长文本时也易出错。当研究员尝试开始新的请求时,屏幕上闪过几次 “当前服务繁忙,请稍后重试”的字样,偶尔电脑界面也会长时间冻结在某个Manus页面上。结论It has a higher failure rate than ChatGPT DeepResearch—a problem the team is addressing, according to Manus’s chief scientist, Peak Ji. That said, the Chinese media outlet 36Kr reports that Manus’s per-task cost is about $2, which is just one-tenth of DeepResearch’s cost. If the Manus team strengthens its server infrastructure, I can see the tool becoming a preferred choice for individual users, particularly white-collar professionals, independent developers, and small teams.

其首席科学家季逸超(Peak Ji)坦言,Manus故障率高于ChatGPT DeepResearch,团队正着手解决。但据中国媒体36氪(36Kr)报道,Manus的单任务成本仅2美元,为DeepResearch的十分之一。若优化服务器,它或将成为白领、独立开发者及小团队的高性价比选择。
Finally, I think it’s really valuable that Manus’s working process feels relatively transparent and collaborative. It actively asks questions along the way and retains key instructions as “knowledge” in its memory for future use, allowing for an easily customizable agentic experience. It’s also really nice that each session is replayable and shareable.

I expect I will keep using Manus for all sorts of tasks, in both my personal and professional lives. While I’m not sure the comparisons to DeepSeek are quite right, it serves as further evidence that Chinese AI companies are not just following in the footsteps of their Western counterparts. Rather than just innovating on base models, they are actively shaping the adoption of autonomous AI agents in their own way.

研究员希望能在生活和工作中继续使用 Manus 应对各种任务。尽管“第二个DeepSeek”的比喻未必准确,但Manus再次证明,中国AI公司并非单纯追随西方技术路径。他们正以独特方式推动自主智能体的落地——不局限于底层模型创新,更聚焦于实际应用场景的深耕。


