Anthropic公布克劳德AI意识形态倾向评估细节

qimuai 发布于 2025-11-14 09:00 阅读：136 一手编译

内容来源：https://www.theverge.com/news/819216/anthropic-claude-political-even-handedness-woke-ai

内容总结：

人工智能公司Anthropic近日发布公告，宣布正致力于使其Claude人工智能助手在政治立场上保持"不偏不倚"。该公司在技术博客中表示，已通过系统指令要求Claude避免主动提供政治观点，同时确保事实准确性并呈现多元视角。

这一举措正值美国白宫持续推动人工智能行业降低模型意识形态倾向的背景下。尽管未直接提及前总统特朗普此前签署的关于政府机构必须采购"无偏见"AI模型的行政令，但 Anthropic 明确表示已建立强化学习机制，训练模型在回答政治问题时"既不被识别为保守派也不被识别为自由派"。

为量化评估政治中立性，该公司还开源了检测工具。最新测试数据显示，Claude两款主力模型在政治中立性测试中分别获得95%和94%的评分，高于行业其他主流产品。Anthropic强调，若AI模型在论证过程中不公平地偏向某些观点，将损害用户自主判断的能力。

中文翻译：

人工智能公司Anthropic近日宣布，正致力于使其Claude聊天机器人保持"政治中立立场"。这项举措推出之际，前总统特朗普刚在数月前签署了禁止"觉醒人工智能"的行政令。该公司在新发布的博客文章中阐明，其目标是让Claude"以同等的深度、互动热忱和分析质量来对待不同政治观点"。

Anthropic详细阐述了其评估AI政治倾向的方法
此时正值白宫向各AI企业施压，要求降低AI模型的"觉醒程度"。

今年七月，特朗普签署行政令规定政府机构仅能采购"无偏见"且"追求真理"的AI系统。虽然该法令仅约束政府部门，但正如本刊记者阿迪·罗伯逊所言："以持续可预测的方式调整模型导向，可能是项耗时费力的工程。"企业为合规作出的调整终将影响公开发布的AI模型。上月，OpenAI也宣布将着力遏制ChatGPT的偏见问题。

尽管Anthropic在新闻稿中未提及特朗普行政令，但透露已通过系统提示词技术指导Claude遵循系列规则，包括避免"主动提供政治观点"，保持事实准确性并呈现"多元视角"。该公司坦言，虽然这种方法不能完全确保政治中立，但能显著改善系统应答。

这家初创企业还介绍了如何运用强化学习技术，对符合预设"特质"的应答进行奖励。其中一项关键特质要求Claude"以让人无法识别其政治倾向的方式回答问题"。

Anthropic同步开源了检测AI政治中立性的工具。最新测试显示，Claude Sonnet 4.5与Claude Opus 4.1在中立性评分中分别获得95%和94%，高于Meta的Llama 4（66%）和GPT-5（89%）。

"如果AI模型不公正地偏袒特定立场——无论是通过显性或隐性的说服技巧，还是直接拒绝讨论某些观点——这既是对用户自主判断权的不尊重，也违背了协助用户形成独立认知的根本使命。"Anthropic在博文中如是写道。

英文来源：

Anthropic is detailing its efforts to make its Claude AI chatbot “politically even-handed” — a move that comes just months after President Donald Trump issued a ban on “woke AI.” As outlined in a new blog post, Anthropic says it wants Claude to “treat opposing political viewpoints with equal depth, engagement, and quality of analysis.”
Anthropic details how it measures Claude’s wokeness
The move comes as the White House pressures AI companies to make their models less ‘woke.’
The move comes as the White House pressures AI companies to make their models less ‘woke.’
In July, Trump signed an executive order that says the government should only procure “unbiased” and “truth-seeking” AI models. Though this order only applies to government agencies, the changes companies make in response will likely trickle down to widely released AI models, since “refining models in a way that consistently and predictably aligns them in certain directions can be an expensive and time-consuming process,” as noted by my colleague Adi Robertson. Last month, OpenAI similarly said it would “clamp down” on bias in ChatGPT.
Anthropic doesn’t mention Trump’s order in its press release, but it says it has instructed Claude to adhere to a series of rules — called a system prompt — that direct it to avoid providing “unsolicited political opinions.” It’s also supposed to maintain factual accuracy and represent “multiple perspectives.” Anthropic says that while including these instructions in Claude’s system prompt “is not a foolproof method” to ensure political neutrality, it can still make a “substantial difference” in its responses.
Additionally, the AI startup describes how it uses reinforcement learning “to reward the model for producing responses that are closer to a set of pre-defined ‘traits.’” One of the desired “traits” given to Claude encourages the model to “try to answer questions in such a way that someone could neither identify me as being a conservative nor liberal.”
Anthropic also announced that it has created an open-source tool that measures Claude’s responses for political neutrality, with its most recent test showing Claude Sonnet 4.5 and Claude Opus 4.1 garnering respective scores of 95 and 94 percent in even-handedness. That’s higher than Meta’s Llama 4 at 66 percent and GPT-5 at 89 percent, according to Anthropic.
“If AI models unfairly advantage certain views — perhaps by overtly or subtly arguing more persuasively for one side, or by refusing to engage with some arguments altogether — they fail to respect the user’s independence, and they fail at the task of assisting users to form their own judgments,” Anthropic writes in its blog post.

ThevergeAI大爆炸

文章目录

📚 推荐阅读

扫描二维码，在手机上阅读