
歡迎來到《聚焦AI》(Eye on AI)!我是AI記者莎倫·戈德曼,本期由我代班正在休假的杰里米·卡恩,為您帶來最新資訊。本期看點有:美國總務管理局(General Services Administration)批準將OpenAI、Google、Anthropic納入聯邦AI供應商名單,AI投資熱潮對美國經濟的影響,Clay AI完成1億美元融資,估值達31億美元。周六,約2000名學生、研究人員及科技圈人士涌入加州大學伯克利分校,共話AI智能體發展前景,或許只有在灣區大家才會對這種周末安排習以為常。當我拿著為期一天的“代理式AI峰會”(Agentic AI Summit)的參會證,看著排隊的人群在學生活動中心的大堂里蜿蜒前行時,感覺自己不像是在參加學術會議,倒更像是來到了硅谷版的紐約網紅餐廳。
之所以會出現如此盛況,與本次峰會豪華的演講嘉賓陣容顯然有著莫大關系,其中不乏頂級AI研究人員和科學家,包括OpenAI首席科學家雅各布·帕喬基、谷歌DeepMind研究副總裁艾德?池、英偉達(Nvidia)首席科學家比爾·戴利、Databricks與Anyscale聯合創始人揚·斯托伊卡,以及專注AI安全領域的業內先驅、加州大學伯克利分校教授宋曉東。
締造如此火爆場面的另一個推手,或許是本次峰會的主題本身——當下的熱門話題——AI智能體(AI Agent)。簡單來說,它是一套由AI驅動的系統,能夠高度自主地調用各類軟件工具完成任務。我們可以將其想象為一種聊天機器人,只是其不僅能夠推薦度假行程,更能直接幫你訂機票、訂酒店。
正如我的同事杰里米·卡恩在近期文章中所言:“這種自動化技術一直讓企業高管魂牽夢繞。過去十年間,企業廣泛引入‘機器人流程自動化’(Robotic Process Automation,簡稱RPA)工具。這類軟件能夠自動執行重復性任務,例如在數據庫程序間剪切粘貼數據。但傳統RPA系統僵化死板,無法處理意外情況,且通常僅能完成單一特定任務?!贝硎紸I(Agentic AI) 的設計目標,正是以更強的靈活性與功能突破這些局限,適應業務需求。
在2025年1月的一篇博客文章中,OpenAI首席執行官山姆·奧特曼表示:“我們相信,到2025年,首批AI智能體或將‘正式入職’企業,給企業的工作效率帶來實質性變化?!?/p>
盡管熱度空前高漲,“代理式AI峰會”的主基調卻十分清醒克制:AI智能體固然是當下AI領域的“當紅炸子雞”,但這項技術目前仍不成熟。AI智能體的表現難言穩健可靠,令人遺憾,且其常會陷入“記憶斷層”困局。
例如,谷歌DeepMind的艾德·池就強調,當前AI智能體在定制化演示環境中所展示出的能力與真實生產環境的需求之間仍存在顯著差距。帕喬基則強調了對智能體系統安全性、安保性與可信度的關切,尤其是在這類系統被集成至敏感應用場景,或需完全自主運行時。
OpenAI API工程主管吳雪楓說:“我始終認為AI智能體的表現未達預期。其在某些通用場景確實運作良好,但我的日常工作體驗并未因智能體的應用而產生實質性變化?!?/p>
盡管當下AI智能體的表現與市場的狂熱預期間仍有差距(如Salesforce首席執行官馬克·貝尼奧夫近日宣稱,向“數字化勞動力”轉型意味著他將是“Salesforce最后一位只管理人類員工的CEO”),但代理式AI峰會的演講嘉賓們仍對該技術的前景滿懷信心。Databricks的斯托伊卡對基礎設施的升級做出了高度評價,認為這些進步將明顯降低智能體系統的開發門檻。英偉達的戴利則指出,硬件技術的持續突破將助力AI智能體獲得更強大、高效的行為能力。還有多位專家列舉了編程等特定領域取得的“局部突破”。
如今,AI智能體或許仍面臨成長陣痛,但加州大學伯克利分校擠爆會場的盛況足以證明,整個行業對其未來發展仍充滿期待,寄望其有朝一日能在現實世界實現可靠運行。從業者堅信,等待終將換來豐厚的回報。
先說到這,下面是更多AI領域的新聞。
AI新聞速遞
美國聯邦政府批準OpenAI、谷歌、Anthropic加入AI供應商名錄。路透社(Reuters)報道,美國政府中央采購部門——總務管理局(GSA)已將OpenAI的ChatGPT、谷歌的Gemini和Anthropic的Claude等大模型列入AI供應商名單,加速政府部門對AI技術的應用。這些工具將通過一個設有合同條款的平臺開放給各機構使用。GSA強調,獲批AI供應商“承諾遵循負責任使用原則,并確保相關服務符合聯邦標準”。
AI投資熱潮或對美國經濟產生實質影響。據《華盛頓郵報》(Washington Post)報道,盡管美國整體經濟顯現放緩跡象,但谷歌、Meta、亞馬遜和微軟等科技巨頭今年在AI領域的創紀錄投資(超3500億美元)將成為推動經濟增長的關鍵動力。在就業增長降溫的背景下,該領域的巨額投資將推動數據中心建設,并刺激市場對芯片、服務器及網絡設備的需求,預計在2025年或將拉動0.7%的GDP增長。但也有經濟學家警告稱,經濟增長對科技巨頭的依賴性不斷增強也會帶來風險,一旦AI熱潮開始消退,經濟或將承受嚴重沖擊。
AI銷售工具Clay完成1億美元C輪融資,估值飆升至31億美元。據《紐約時報》(New York Times)Dealbook報道,專注幫助銷售與營銷人員挖掘潛在新客戶并推動轉化的AI平臺Clay,近日完成1億美元(約合人民幣7.3億元)C輪融資,投后估值達31億美元(約合人民幣222.9億元)。本輪投資由谷歌母公司Alphabet旗下投資機構CapitalG領投,Meritech Capital Partners及紅杉資本(Sequoia Capital)跟投。此次融資距該初創企業上一輪12.5億美元估值融資僅相隔約半年。
AI研發新動向
谷歌DeepMind發布新一代Genie 3“世界模型”,打造可實時交互虛擬世界。谷歌DeepMind推出革命性AI系統Genie 3,僅需輸入簡單文本提示即可生成內容豐富的交互式虛擬世界,支持以每秒24幀的速率實時探索動態環境。盡管我們很容易聯想到使用該模型為玩家提供終極游戲體驗,但其本質仍是谷歌長期推進“世界模型”(即能學習世界運行規律并模擬真實環境的AI系統)的最新突破。這類模型被視為訓練高級智能體乃至實現通用人工智能(AGI)的關鍵技術。與此前的視頻生成模型不同,Genie 3生成的場景能動態維持數分鐘的視覺一致性,用戶可在其中自由行動,甚至可以通過指令(如““下雪”或“添加角色”)實時改變環境狀態。目前,DeepMind僅向少部分研究人員和創作者開放訪問權限,探索負責任部署路徑,評估潛在風險。
前沿探索
“思考深度”會否成為影響AI推理能力的關鍵要素?
新問世的一款微型AI模型顛覆了我們對模型推理學習機制的認知。新加坡的Sapient Intelligence團隊近期發布的分層推理模型(HRM)借鑒了人腦的分層思考過程,相關成果已在AI界引發熱議。盡管HRM的數據量僅為ChatGPT的1/100,訓練所用的樣本數量也僅為1000個(未使用互聯網數據或進行分步指導),卻能解決讓許多體量更大的模型都束手無策的數獨、迷宮導航等復雜邏輯問題以及抽象推理任務。與模仿人類語言不同,HRM通過內部隱藏的邏輯循環進行推理,與人在腦海中解謎的過程非常相似。該模型的成功或許預示AI領域將迎來重大變革,讓思考深度成為比模型規模更重要的影響因素。(財富中文網)
譯者:梁宇
審校:夏林
歡迎來到《聚焦AI》(Eye on AI)!我是AI記者莎倫·戈德曼,本期由我代班正在休假的杰里米·卡恩,為您帶來最新資訊。本期看點有:美國總務管理局(General Services Administration)批準將OpenAI、Google、Anthropic納入聯邦AI供應商名單,AI投資熱潮對美國經濟的影響,Clay AI完成1億美元融資,估值達31億美元。周六,約2000名學生、研究人員及科技圈人士涌入加州大學伯克利分校,共話AI智能體發展前景,或許只有在灣區大家才會對這種周末安排習以為常。當我拿著為期一天的“代理式AI峰會”(Agentic AI Summit)的參會證,看著排隊的人群在學生活動中心的大堂里蜿蜒前行時,感覺自己不像是在參加學術會議,倒更像是來到了硅谷版的紐約網紅餐廳。
之所以會出現如此盛況,與本次峰會豪華的演講嘉賓陣容顯然有著莫大關系,其中不乏頂級AI研究人員和科學家,包括OpenAI首席科學家雅各布·帕喬基、谷歌DeepMind研究副總裁艾德?池、英偉達(Nvidia)首席科學家比爾·戴利、Databricks與Anyscale聯合創始人揚·斯托伊卡,以及專注AI安全領域的業內先驅、加州大學伯克利分校教授宋曉東。
締造如此火爆場面的另一個推手,或許是本次峰會的主題本身——當下的熱門話題——AI智能體(AI Agent)。簡單來說,它是一套由AI驅動的系統,能夠高度自主地調用各類軟件工具完成任務。我們可以將其想象為一種聊天機器人,只是其不僅能夠推薦度假行程,更能直接幫你訂機票、訂酒店。
正如我的同事杰里米·卡恩在近期文章中所言:“這種自動化技術一直讓企業高管魂牽夢繞。過去十年間,企業廣泛引入‘機器人流程自動化’(Robotic Process Automation,簡稱RPA)工具。這類軟件能夠自動執行重復性任務,例如在數據庫程序間剪切粘貼數據。但傳統RPA系統僵化死板,無法處理意外情況,且通常僅能完成單一特定任務?!贝硎紸I(Agentic AI) 的設計目標,正是以更強的靈活性與功能突破這些局限,適應業務需求。
在2025年1月的一篇博客文章中,OpenAI首席執行官山姆·奧特曼表示:“我們相信,到2025年,首批AI智能體或將‘正式入職’企業,給企業的工作效率帶來實質性變化?!?/p>
盡管熱度空前高漲,“代理式AI峰會”的主基調卻十分清醒克制:AI智能體固然是當下AI領域的“當紅炸子雞”,但這項技術目前仍不成熟。AI智能體的表現難言穩健可靠,令人遺憾,且其常會陷入“記憶斷層”困局。
例如,谷歌DeepMind的艾德·池就強調,當前AI智能體在定制化演示環境中所展示出的能力與真實生產環境的需求之間仍存在顯著差距。帕喬基則強調了對智能體系統安全性、安保性與可信度的關切,尤其是在這類系統被集成至敏感應用場景,或需完全自主運行時。
OpenAI API工程主管吳雪楓說:“我始終認為AI智能體的表現未達預期。其在某些通用場景確實運作良好,但我的日常工作體驗并未因智能體的應用而產生實質性變化。”
盡管當下AI智能體的表現與市場的狂熱預期間仍有差距(如Salesforce首席執行官馬克·貝尼奧夫近日宣稱,向“數字化勞動力”轉型意味著他將是“Salesforce最后一位只管理人類員工的CEO”),但代理式AI峰會的演講嘉賓們仍對該技術的前景滿懷信心。Databricks的斯托伊卡對基礎設施的升級做出了高度評價,認為這些進步將明顯降低智能體系統的開發門檻。英偉達的戴利則指出,硬件技術的持續突破將助力AI智能體獲得更強大、高效的行為能力。還有多位專家列舉了編程等特定領域取得的“局部突破”。
如今,AI智能體或許仍面臨成長陣痛,但加州大學伯克利分校擠爆會場的盛況足以證明,整個行業對其未來發展仍充滿期待,寄望其有朝一日能在現實世界實現可靠運行。從業者堅信,等待終將換來豐厚的回報。
先說到這,下面是更多AI領域的新聞。
AI新聞速遞
美國聯邦政府批準OpenAI、谷歌、Anthropic加入AI供應商名錄。路透社(Reuters)報道,美國政府中央采購部門——總務管理局(GSA)已將OpenAI的ChatGPT、谷歌的Gemini和Anthropic的Claude等大模型列入AI供應商名單,加速政府部門對AI技術的應用。這些工具將通過一個設有合同條款的平臺開放給各機構使用。GSA強調,獲批AI供應商“承諾遵循負責任使用原則,并確保相關服務符合聯邦標準”。
AI投資熱潮或對美國經濟產生實質影響。據《華盛頓郵報》(Washington Post)報道,盡管美國整體經濟顯現放緩跡象,但谷歌、Meta、亞馬遜和微軟等科技巨頭今年在AI領域的創紀錄投資(超3500億美元)將成為推動經濟增長的關鍵動力。在就業增長降溫的背景下,該領域的巨額投資將推動數據中心建設,并刺激市場對芯片、服務器及網絡設備的需求,預計在2025年或將拉動0.7%的GDP增長。但也有經濟學家警告稱,經濟增長對科技巨頭的依賴性不斷增強也會帶來風險,一旦AI熱潮開始消退,經濟或將承受嚴重沖擊。
AI銷售工具Clay完成1億美元C輪融資,估值飆升至31億美元。據《紐約時報》(New York Times)Dealbook報道,專注幫助銷售與營銷人員挖掘潛在新客戶并推動轉化的AI平臺Clay,近日完成1億美元(約合人民幣7.3億元)C輪融資,投后估值達31億美元(約合人民幣222.9億元)。本輪投資由谷歌母公司Alphabet旗下投資機構CapitalG領投,Meritech Capital Partners及紅杉資本(Sequoia Capital)跟投。此次融資距該初創企業上一輪12.5億美元估值融資僅相隔約半年。
AI研發新動向
谷歌DeepMind發布新一代Genie 3“世界模型”,打造可實時交互虛擬世界。谷歌DeepMind推出革命性AI系統Genie 3,僅需輸入簡單文本提示即可生成內容豐富的交互式虛擬世界,支持以每秒24幀的速率實時探索動態環境。盡管我們很容易聯想到使用該模型為玩家提供終極游戲體驗,但其本質仍是谷歌長期推進“世界模型”(即能學習世界運行規律并模擬真實環境的AI系統)的最新突破。這類模型被視為訓練高級智能體乃至實現通用人工智能(AGI)的關鍵技術。與此前的視頻生成模型不同,Genie 3生成的場景能動態維持數分鐘的視覺一致性,用戶可在其中自由行動,甚至可以通過指令(如““下雪”或“添加角色”)實時改變環境狀態。目前,DeepMind僅向少部分研究人員和創作者開放訪問權限,探索負責任部署路徑,評估潛在風險。
前沿探索
“思考深度”會否成為影響AI推理能力的關鍵要素?
新問世的一款微型AI模型顛覆了我們對模型推理學習機制的認知。新加坡的Sapient Intelligence團隊近期發布的分層推理模型(HRM)借鑒了人腦的分層思考過程,相關成果已在AI界引發熱議。盡管HRM的數據量僅為ChatGPT的1/100,訓練所用的樣本數量也僅為1000個(未使用互聯網數據或進行分步指導),卻能解決讓許多體量更大的模型都束手無策的數獨、迷宮導航等復雜邏輯問題以及抽象推理任務。與模仿人類語言不同,HRM通過內部隱藏的邏輯循環進行推理,與人在腦海中解謎的過程非常相似。該模型的成功或許預示AI領域將迎來重大變革,讓思考深度成為比模型規模更重要的影響因素。(財富中文網)
譯者:梁宇
審校:夏林
Welcome to Eye on AI! AI reporter Sharon Goldman here, filling in for Jeremy Kahn, who is on holiday. In this edition…General Services Administration approves OpenAI, Google, Anthropic for federal AI vendor list…Consequences of AI spending boom on U.S. economy…Clay AI raises $100 million at $3.1 billion valuation. Only in the Bay Area does spending a Saturday geeking out about AI agents—alongside 2,000 students, researchers, and tech insiders crammed into UC Berkeley—feel like a totally normal weekend plan. As I picked up my badge at the day-long Agentic AI Summit and watched the line snake through the student union lobby, it felt less like an academic conference and more like Silicon Valley’s version of a buzzy New York brunch spot.
This was certainly due to the speaker lineup, which was stacked with top AI researchers and scientists, including Jakob Pachocki, chief scientist at OpenAI; Ed Chi, VP of research at Google DeepMind; Bill Dally, chief scientist at Nvidia; Ion Stoica, cofounder at Databricks & Anyscale, as well as a UC Berkeley professor; and Dawn Song, a pioneering UC Berkeley professor focused on AI security.
The popularity also might have been due to the buzzy topic—AI agents, generally defined as an AI-powered system that can complete tasks, mostly autonomously, using other software tools. Think a chatbot not only suggesting a vacation itinerary, but also booking the flight and making the hotel reservation.
As my colleague Jeremy Kahn said in a recent article, “This kind of automation is a perennial C-suite fever dream. Over the past decade, companies embraced ‘robotic process automation,’ or RPA. This was software that could automate repetitive tasks, such as cutting and pasting between database programs. But traditional RPA systems are inflexible and unable to deal with exceptions, and can usually handle only one narrow task.” Agentic AI is meant to be both more flexible and powerful, adapting to business needs.
In a January 2025 blog post, OpenAI CEO Sam Altman said, “We believe that, in 2025, we may see the first AI agents ‘join the workforce’ and materially change the output of companies.”
But despite the hype, the overall message at the Agentic AI Summit was cautious and grounded: Agents may be the buzziest trend in AI right now, but the tech still has a long way to go, they said. AI agents, unfortunately, aren’t always reliable. They may not remember what came before.
Google DeepMind’s Chi, for example, stressed the gap between what agents can do in curated demos versus what’s still needed in real-world production environments. Pachocki highlighted concerns around the safety, security, and trustworthiness of agentic systems, particularly when they’re integrated into sensitive applications or operate autonomously.
“I still don’t think agents have really lived up to their promise,” said Sherwin Wu, head of engineering at OpenAI API. “Certain more generic cases have worked, but my day-to-day work doesn’t really feel that different with agents.”
While today’s agents may not currently live up to the massive hype (consider Salesforce CEO Marc Benioff’s recent claim that a shift to digital labor means he will be the “last CEO of Salesforce who only managed humans”), the speakers at the Agentic AI Summit still had plenty of optimism to share. Databricks’ Stoica expressed enthusiasm about infrastructure improvements that are making it easier to build agentic systems. Nvidia’s Dally suggested that continued hardware advances will enable more powerful and efficient agent behavior. Several pointed out “narrow wins” in specific domains, like coding.
Today’s AI agents may still have growing pains, but given the crowded UC Berkeley ballroom, the industry maintains its eye on the prize: AI agents that can reliably operate in the real world. The payoff, they believe, will be well worth the wait.
With that, here’s more AI news.
AI IN THE NEWS
U. S. agency approves OpenAI, Google, Anthropic for federal AI vendor list. Reuters reported today that the General Services Administration, which is the U.S. government's central purchasing arm, added OpenAI's ChatGPT, Google's Gemini, and Anthropic's Claude to a list of approved AI vendors in order to accelerate use of the technology by government agencies. The tools will be available to the agencies through a platform with contract terms in place. The GSA said approved AI providers "are committed to responsible use and compliance with federal standards."
The AI spending boom could have real consequences for the U.S. economy. According to the Washington Post, Big Tech’s record-breaking investment in artificial intelligence—more than $350 billion this year from Google, Meta, Amazon, and Microsoft—is becoming a major economic force, even as the broader U.S. economy shows signs of slowing. While job growth is cooling, this massive AI spending spree is fueling construction of data centers and driving demand for chips, servers, and networking gear—potentially boosting GDP growth by up to 0.7% in 2025. But economists warn the growing reliance on tech giants to prop up the economy is risky: if the AI boom loses steam, the economic fallout could be significant.
AI sales tool Clay raises $100 million at a $3.1 billion valuation. The New York Times Dealbook reported that Clay, which helps sales reps and marketers find new leads and turn them into customers, has raised $100 million at a $3.1 billion valuation.The round was led by CapitalG, an investment arm of Alphabet, Google’s parent company. Other participants included Meritech Capital Partners and Sequoia Capital. It comes around six months after the start-up raised money at a $1.25 billion valuation.
EYE ON AI RESEARCH
Google DeepMind's new Genie 3 'world model' creates real-time interactive simulations. Google DeepMind has unveiled Genie 3, a powerful new AI system that can generate rich, interactive virtual worlds from simple text prompts—making it possible to navigate dynamic environments in real time at 24 frames per second. But while it's tempting to immediately leap to using the model for the ultimate gaming experience, it’s actually the latest leap in the company’s long-term push toward 'world models'—or AI systems that can learn how the world works and simulate real-world environments. These are seen as key to training advanced agents and, eventually, achieving artificial general intelligence. Unlike prior video generators, Genie 3 allows users to move through AI-generated environments that stay visually consistent over several minutes—and even respond to commands like “make it snow” or “add a character.” For now, DeepMind is limiting access to Genie 3 to a small group of researchers and creators while it explores responsible deployment and risk.
BRAIN FOOD
Could "depth of thought" be key to AI reasoning?
A tiny new AI model is challenging what we know about how models learn to reason: Researchers from Singapore's Sapient Intelligence recently released the Hierarchical Reasoning Model (HRM), which draws inspiration from the brain’s layered thinking process—and the results have the AI community chattering. Despite being 100 times smaller than ChatGPT and trained on just 1,000 examples (with no internet data or step-by-step guidance), HRM solves tough logic problems like Sudoku, maze navigation, and abstract reasoning tasks that stump much larger models. Instead of mimicking human language, HRM reasons internally—quietly working through problems in hidden loops, much like a person thinking through a puzzle in their head. Its success hints at a radical shift in AI: one where depth of thought might matter more than scale.