美国国家标准与技术研究院报告指出DeepSeek AI模型存在风险

qimuai 发布于 2025-10-2 11:01 阅读：22 一手编译

内容来源：https://aibusiness.com/foundation-models/nist-report-pinpoints-risks-deepseek-models

内容总结：

美国国家标准与技术研究院（NIST）最新报告指出，中国生成式AI企业深度求索（DeepSeek）的模型在网络安全和逻辑推理等领域仍落后于美国同类产品。这份9月30日发布的报告显示，相较于OpenAI的GPT-5和Anthropic的Claude Opus 4，DeepSeek模型更易遭受代理劫持攻击，且对恶意指令的服从度较高。

报告特别提到，DeepSeek模型存在向第三方机构（包括字节跳动等中国企业）共享用户数据的行为，并表现出符合中国法律法规的内容审核特征。对此，RPA2AI研究公司首席执行官卡夏普·康佩拉分析认为，大型语言模型都会体现开发者的国家背景和价值取向，“中国模型的审查机制并非偶然特性，而是结构性嵌入”。

不过报告也承认，在科学知识问答和数学推理等特定领域，DeepSeek V3.1版本的表现与美国同类产品不相上下。康佩拉指出，这反映出中美两国在AI发展路径上已出现分化：中国模型在科学推理领域具有竞争力，而美国模型在软件工程和安全应用方面保持领先。

针对企业用户关注的安全问题，富图姆集团分析师戴维·尼科尔森建议西方企业若使用DeepSeek模型，应通过AWS Bedrock或微软Azure等企业级安全平台进行部署，并采取审慎态度。他表示：“在当前高度依赖大语言模型的技术环境中，如何确保数据安全已成为企业面临的核心课题。”

值得注意的是，尽管DeepSeek-R1开源模型此前因其以较低算力成本实现媲美西方模型的性能引发关注，但专家指出其宣传的部分性能指标基于不切实际的硬件使用假设，实际企业部署时难以复现。

中文翻译：

由谷歌云赞助
选择您的首个生成式AI应用场景
开启生成式AI之旅时，首先应关注能够优化人类信息交互体验的领域。

美国国家标准与技术研究院发布的最新报告对中国生成式AI模型提供商DeepSeek的安全性及其他缺陷提出质疑，并警示企业需审慎对待其模型。该美国政府机构指出，DeepSeek模型在网络安全和推理能力方面均落后于美国同类产品。

在9月30日发布的报告中，NIST人工智能标准与创新中心指出，DeepSeek模型在网络安全、推理等性能基准测试中均逊于美国模型。报告显示，相较于OpenAI的GPT-5和Anthropic的Claude Opus 4等美国模型，DeepSeek模型更易遭受代理劫持攻击——这种攻击旨在窃取用户登录凭证。

报告还指出DeepSeek模型会响应大多数恶意请求。其内容审核机制体现国家主导的审查取向，与中国政府立场高度一致，尤其在台湾属于中国领土等政治议题上展现明显倾向性。研究同时发现该模型会将用户数据共享给第三方机构，包括社交媒体TikTok的创始母公司、中国科技巨头字节跳动。

该报告是CAISI为响应特朗普总统《人工智能行动计划》中要求评估中国模型的指令而撰写。此时距DeepSeek-R1模型引爆AI圈已近一年，该模型当时以媲美西方模型的性能及显著较低的开发算力成本引发轰动。

尽管开源模型DeepSeek-R1未达到OpenAI聊天机器人ChatGPT的普及度，这款中国模型仍拥有大量国际用户并产生深远影响：它不仅分流了Meta公司Llama系列开源权重模型的市场关注度，还可能促使原本专注闭源模型的OpenAI于今年初提前发布开源版本。

RPA2AI研究公司首席执行官卡什亚普·康佩拉指出，该报告的诸多发现印证了大语言模型体现其源生国特征的普遍规律。“NIST的评估表明LLM如何承载开发者的世界观与政治偏见，DeepSeek反映着中华人民共和国所容忍乃至支持的立场。”他补充说明，因中国法规要求模型上市前必须通过审核，内容审查机制已成为中国模型的固有特性。“审查层并非偶然存在，而是结构性嵌入。开源分发与本地部署可缓解部分安全隐私担忧，但审查特性始终如影随形。”

尽管存在内容限制，DeepSeek模型仍在特定评估中展现优势。例如在科学与知识类问答基准测试中，其表现与美国模型不相上下。CAISI研究发现，DeepSeek V3.1在科学问题及数学推理任务中持续保持领先排名，而美国模型则在软件工程与网络安全领域更胜一筹。

“评估结果呈现明显分化，”康佩拉分析道，“中国模型在科学推理与符号领域具备竞争力，美国模型则主导软件工程与安全应用领域。”他认为这反映两国不同的战略重点，也预示生成式AI正走向专业化发展，而非由单一国家实现全面垄断。

值得注意的是，正如中国模型内置审查机制，康佩拉指出美国模型也“始终受制于企业防护栏与商业利益”。不过报告结论并未带来太多新知，业内早已关注到DeepSeek的性能瓶颈与审查问题。

Futurum集团分析师大卫·尼科尔森直言：“DeepSeek宣传的部分性能数据显然基于不切实际的硬件使用设想——比如指望企业雇佣博士团队绕过英伟达软件栈来提升GPU性能，这根本不合常理。”他同时强调，由于安全漏洞隐患，企业采用DeepSeek模型的方式仍不明朗。

“众多从业者正在努力理解LLM存在后门程序意味着什么，”尼科尔森表示，“模型长期受政治视角影响，特定领域表现存在差异实属正常。”但他指出核心问题在于：当企业日益依赖LLM并持续生成新数据时，何为真正的技术安全？

尼科尔森建议西方企业若使用DeepSeek模型，应将其置于受控环境，并通过AWS Bedrock或微软Azure等企业级安全平台进行访问。“个人认为西方公司应秉持审慎原则，相较于那些对西方价值观保有基本忠诚的替代方案，我难以确信DeepSeek具有同等安全性。”

英文来源：

Sponsored by Google Cloud
Choosing Your First Generative AI Use Cases
To get started with generative AI, first focus on areas that can improve human experiences with information.
The U.S. government agency said DeepSeek's models lag behind U.S. counterparts in cybersecurity and reasoning capabilities.
A new report by the National Institute of Standards and Technology raises questions about Chinese generative AI model provider DeepSeek and how enterprises should approach DeepSeek models, given their security problems and other vulnerabilities.
In the report released Sept. 30, NIST's Center for AI Standards and Innovation said DeepSeek models lagged behind U.S. models in cybersecurity, reasoning and other performance benchmarks.
The report said DeepSeek models are more susceptible to agent hijacking, in which there is an attempt to steal user login credentials, than U.S. models GPT-5 from OpenAI and Anthropic's Claude Opus 4.
DeepSeek models also complied with most malicious requests, according to the report. The models reflect state-sponsored censorship that aligns with Chinese government positions and show a bias toward Chinese political topics, the report stated, citing that the models claim that Taiwan is part of China's territory. The models were also found to share user data with third-party entities, including Chinese tech giant ByteDance, the original parent company of social media app TikTok.
CAISI prepared the report in response to President Donald Trump's AI Action Plan, which requested that it evaluate models from China.
The report comes nearly a year after the DeepSeek-R1 model captured a viral moment in the AI market, apparently matching Western models in performance but requiring much less compute power and financial resources to develop.
Although the open source DeepSeek-R1 model is not as popular as OpenAI's hugely popular ChatGPT consumer chatbot, the Chinese model has many users around the world and has made an impact. For example, it led Meta's Llama line of open weight models to lose some mindshare and might have been a motivating factor for OpenAI to release some open models earlier this year, when the vendor was previously focused on closed models.
Some of the points in the report also reflect the ways large language models (LLMs) exemplify their national origins, said Kashyap Kompella, CEO and founder of RPA2AI Research.
"NIST's evaluation underscores how LLMs encode the worldview and political biases of their developers," Kompella said. "DeepSeek mirrors the positions tolerated, and in some cases endorsed, by the People's Republic of China."
He added that censorship is inevitable with Chinese models due to the regulatory requirements that models must meet in China before being released.
"The censorship layer is not an incidental quirk but is structurally embedded," Kompella continued. "Open source distribution and local hosting can mitigate some security and privacy concerns, but censorship features remain intrinsic."
Despite censorship, DeepSeek models demonstrate strengths and are comparable to American models in specific evaluations. For example, when evaluated on question-and-answer-style science and knowledge benchmarks, DeepSeek's models have registered similar performance to U.S. models.
For instance, DeepSeek V3.1 consistently ranked as a top performer on science-related questions, as well as mathematical and reasoning tasks. Comparatively, in software engineering and cybersecurity domains, U.S. models outperformed DeepSeek models, according to CAISI's findings.
"A bifurcation is visible in these evaluations," Kompella said. "Chinese models are competitive in scientific reasoning and symbolic domains; U.S. models are leading in software engineering and security applications."
He added that this illustrates national priorities in both countries and the beginning of specialization rather than universal dominance by one country’s generative AI sector over another.
Moreover, just as Chinese systems come with built-in censorship, the U.S. models "remain tethered to corporate guardrails and commercial incentives," Kompella said.
However, the report's findings are not entirely new, as many are aware of DeepSeek's performance challenges and censorship issues.
"There's no question that some of the performance data that was touted by DeepSeek is based on unrealistic uses of hardware, specifically the idea that enterprises will deploy their own armies of PhDs to bypass the Nvidia software stack to get better performance out of GPUs," David Nicholson, an analyst at Futurum Group, said, referring to the AI chipmaker’s ubiquitous GPUs. "That doesn't happen.
"Meanwhile, it is unclear how enterprises might approach DeepSeek models due to their security vulnerabilities," he said.
"A lot of people are really trying to sort through what it means to have the equivalent of a backdoor into an LLM," Nicholson said.
He added that models have long been influenced by political perspectives, and it's natural for some models to perform better than others in certain areas.
However, Nicholson said the key question for enterprises is what it means to be secure in a tech environment in which they are now heavily relying on LLMs and constantly generating new data.
Nicholson recommended that enterprises using DeepSeek models in the West do so inside a controlled environment and access it using a safe medium, such as an enterprise-grade secure platform like AWS Bedrock or Microsoft Azure.
"My personal opinion is that companies in the West should err on the side of caution," he said. "I do not trust that DeepSeek is as safe as an alternative that at least has a little bit more allegiance ... to what I would call the West."
You May Also Like

商业视角看AI

文章目录

📚 推荐阅读

扫描二维码，在手机上阅读