随着每个新模型的推出，更智能地交付代理

qimuai 发布于 2025-9-10 02:01 阅读：2 一手编译

随着每个新模型的推出，更智能地交付代理

内容总结：

安全科技新突破：SafetyKit携手OpenAI构筑数字经济新防线

硅谷快讯——在数字化浪潮席卷全球的当下，网络诈骗、违禁活动等风险日益严峻，困扰着各大电商平台、支付机构及金融科技公司。近日，一家名为SafetyKit的创新公司正凭借其多模态AI智能体技术，为这些行业筑起一道坚实的安全防线。该公司深度整合OpenAI的先进模型，实现了对文本、图像、金融交易、商品列表等多源信息的全面风险识别与防范，为风险控制、合规运营和平台安全树立了新标杆。

SafetyKit的AI智能体技术，核心在于其对OpenAI前沿模型（包括GPT-5、GPT-4.1等）的巧妙运用，并结合了深入研究和计算机智能代理（CUA）技术。据SafetyKit内部评估显示，其系统能以超过95%的准确率审查100%的客户内容。这不仅能帮助平台有效保护用户免受欺诈，避免高昂的监管罚款，还能执行传统系统难以应对的复杂政策，如区域性法规、隐藏在诈骗图片中的电话号码，乃至露骨内容。更重要的是，自动化审核有效减轻了人工审核员接触冒犯性内容的负担，使他们能更专注于处理细致入微的政策判断。

SafetyKit创始人兼首席执行官大卫·格劳恩克（David Graunke）强调：“OpenAI赋予我们接触市场上最先进推理和多模态模型的能力。这使我们能够快速适应，更快地推出新智能体，并处理其他解决方案甚至无法解析的内容类型。”

量身定制的智能体：精准打击各类风险

SafetyKit的智能体并非“一刀切”，而是针对特定风险类别（如诈骗、非法商品等）进行量身定制。每项内容都会被路由到最适合其违规类型的智能体，并匹配最优的OpenAI模型：

GPT-5： 利用多模态推理能力，横跨文本、图像和用户界面，挖掘隐藏风险，支持多层次、精确的决策。
GPT-4.1： 有效遵循详细的内容政策指令，高效管理大批量审核工作流。
强化微调（RFT）： 在默认模型基础上提升召回率和精确度，使复杂安全政策达到前沿性能。
深度研究： 将实时在线调查整合到商家评论和验证中。
计算机智能代理（CUA）： 自动化复杂的政策任务，减少对昂贵人工审核的依赖。

这种“模型匹配”的方法使得SafetyKit能以比传统解决方案更精细、更准确的方式，扩展跨模态的内容审核能力。例如，其诈骗检测智能体不仅扫描文本，还能分析商品图片中内嵌的二维码或电话号码，利用GPT-4.1解析图像并判断是否违规。政策披露智能体则能利用GPT-4.1提取相关条款，再由GPT-5评估合规性，最终标记违规行为。

格劳恩克表示：“我们视智能体为定制化的工作流。有些任务需要深度推理，有些则需要多模态上下文支持。OpenAI是唯一能在两者上都提供可靠性能的技术栈。”

GPT-5：应对复杂政策边界与高风险决策的利器

在许多政策判断中，细微之处往往是关键。例如，市场平台要求商家为保健品提供免责声明，且具体要求可能因产品声明和地区法规而异。传统系统依赖关键词触发或僵硬的规则集，往往难以捕捉其中所需的深度判断，从而导致漏报或错误执行。

SafetyKit的政策披露智能体首先会参照内部政策库，然后由GPT-5评估内容：是否提及治疗或预防？是否在强制披露区域销售？如果满足条件，是否已包含所需声明？任何不符之处，GPT-5都会返回结构化输出，供智能体标记问题。

格劳恩克指出：“GPT-5的强大之处在于其基于真实政策进行精确推理的能力。这使我们即使在其他系统失败的边缘案例中，也能做出准确、站得住脚的决策。”

快速迭代：将每一次模型发布转化为产品升级

SafetyKit对OpenAI的每一次新模型发布都进行严格的基准测试，并针对最棘手的案例进行评估，通常能在模型发布当天就部署表现优异的新模型。通过严谨的内部评估，团队能迅速识别新模型如何提升性能，并无缝集成到核心基础架构中。

例如，当OpenAI o3推出时，SafetyKit便利用其提升了在关键政策领域的边缘案例性能。随后，GPT-5问世，几天内即被部署到其要求最高的智能体中，在最艰难的视觉任务中，基准分数提升了10多个点。

格劳恩克表示：“OpenAI发展迅速，我们的系统也设计为能紧随其后。每一次新发布都为我们提供了运营优势——解锁了以前无法支持的新功能和新领域，并提升了我们为客户提供服务的覆盖范围和准确性。” 同时，SafetyKit也积极回馈OpenAI生态，分享评估结果、边缘案例失败和特定政策洞察，帮助OpenAI塑造未来模型的安全关键工作负载表现。

扩容与成长：OpenAI堆栈助力SafetyKit实现规模化效应

SafetyKit的架构能够大规模执行政策，在速度、精确性和全面的风险覆盖方面表现卓越。在幕后，其日处理数据量已从六个月前的2亿个Token激增至目前的160亿个Token，在不牺牲准确性的前提下分析更多内容。

同期，SafetyKit已将其服务扩展至支付风险、欺诈防范、打击儿童剥削以及反洗钱领域，并新增了数亿终端用户在其保护之下。这一坚实基础使客户能够快速自信地应对新兴风险。

格劳恩克总结道：“我们创建了一个良性循环：OpenAI的每一次发布都直接强化了我们的能力。这就是为什么我们的系统能够持续改进，始终走在不断演进的风险前面。” SafetyKit与OpenAI的紧密合作，无疑为构建更安全、更健康的数字经济环境贡献了关键力量。

英文原文：

SafetyKit⁠(opens in a new window) [https://www.safetykit.com/] builds multimodal
AI agents to help marketplaces, payment platforms, and fintechs detect and act
on fraud and prohibited activity across text, images, financial transactions,
product listings, and more. Recent breakthroughs in model reasoning and
multimodal understanding now make this more effective, setting a new bar for
risk, compliance, and safety operations.

SafetyKit’s agents leverage GPT‑5, GPT‑4.1, deep research, and Computer Using
Agent (CUA) to review 100% of customer content with over 95% accuracy based on
SafetyKit’s evals. They can help platforms protect users, prevent fraud, avoid
regulatory fines and enforce complex policies that legacy systems may miss, such
as region-specific rules, embedded phone numbers in scam images, or explicit
content. Automation can also protect human moderators from exposure to offensive
material and frees them to handle nuanced policy decisions.

“OpenAI gives us access to the most advanced reasoning and multimodal models
on the market. It lets us adapt quickly, ship new agents faster, and handle
content types other solutions can’t even parse.”

David Graunke, Founder and CEO of SafetyKit

DESIGN AGENTS FOR WHAT THE TASK DEMANDS, THEN CHOOSE THE RIGHT MODEL

SafetyKit’s agents are each built to handle a specific risk category, from scams
to illegal products. Every piece of content is routed to the agent best suited
for that violation, using the optimal OpenAI model:

GPT‑5 applies multimodal reasoning across text, images, and UI to surface
hidden risks and support layered, precise decision-making
GPT‑4.1 reliably follows detailed content-policy instructions and efficiently
manages high-volume moderation workflows
Reinforcement fine-tuning (RFT) boosts recall and precision beyond default
models, achieving frontier performance with complex safety policies
Deep research integrates real-time online investigation into merchant reviews
and verifications
Computer Using Agent (CUA) automates complex policy tasks, reducing reliance
on costly manual reviews

This model-matching approach lets SafetyKit scale content review across
modalities with more nuance and accuracy than legacy solutions can.

The Scam Detection agent, for example, goes beyond just scanning text. It
analyzes visuals like QR codes or phone numbers embedded in product images.
GPT‑4.1 helps it parse the image, understand the layout, and decide whether it
is a policy violation.

The Policy Disclosure agent checks listings or landing pages for required
language, such as legal disclaimers or region-specific compliance warnings.
GPT‑4.1 extracts relevant sections, GPT‑5 evaluates compliance, and the agent
flags violations.

“We think of our agents as purpose-built workflows,” says Graunke. “Some tasks
require deep reasoning, others need multimodal context. OpenAI is the only stack
that delivers reliable performance across both.”

Line and bar chart labeled “SafetyKit” on a light background, displaying data
trends and comparisons across multiple categories.
[https://images.ctfassets.net/kftzwdyauwt9/H0hccSQb3OGG6YGmz7HNN/740ec8fcc7fea1868705ee765511508c/SafetyKit_Chart_Desktop_Light.svg?w=3840&q=90]

LEVERAGE GPT‑5 TO NAVIGATE THE GRAY AREAS AND HIGH-STAKES DECISIONS

Policy decisions often hinge on subtle distinctions. Take a marketplace
requiring sellers to include a disclaimer for wellness products, with
requirements varying based on product claims and regional rules. Legacy
providers use keyword triggers or rigid rulesets, which can miss the deeper
judgment calls these decisions may require, leading to missed or incorrect
enforcement.

SafetyKit’s Policy Disclosure agent first references policies from SafetyKit’s
internal library then GPT‑5 evaluates the content: does it mention treatment or
prevention? Is it being sold in a region where disclosure is mandatory? And if
so, is the required language actually included in the listing? If anything falls
short, GPT‑5 returns a structured output the agent uses to flag the issue.

“The power of GPT‑5 is in how precisely it can reason when grounded in real
policy,” notes Graunke. “It lets us make accurate, defensible decisions even in
the edge cases where other systems fail.”

TURN EVERY MODEL RELEASE INTO A PRODUCT WIN

SafetyKit benchmarks each new OpenAI model against its hardest cases, often
deploying top performers the same day. Rigorous internal evaluations allow the
team to quickly identify how new models can improve performance and seamlessly
integrate into their core infrastructure.

When OpenAI o3 launched, SafetyKit used it to boost edge case performance across
key policy areas. GPT‑5 followed, and within days, it was deployed across their
most demanding agents, improving benchmark scores by more than 10 points on
their toughest vision tasks.

“OpenAI moves fast, and we’ve designed our system to keep up. Every new release
gives us an operational edge–unlocking new capabilities and domains we couldn’t
support before, and increasing the coverage and accuracy we deliver to
customers,” says Graunke.

SafetyKit also feeds improvements back into the ecosystem, sharing eval results,
edge case failures, and policy-specific insights directly with OpenAI to help
shape future model performance for safety-critical workloads.

SCALE CUSTOMER AND VOLUME GROWTH WITH THE BEST OPENAI STACK

SafetyKit’s architecture enforces policy at scale, delivering speed, precision,
and comprehensive risk coverage. Behind the scenes, it now handles over 16
billion tokens daily, up from 200 million six months ago, analyzing more content
without sacrificing accuracy.

In that same time, SafetyKit has expanded to SafetyKit has expanded to payments
risk, fraud, anti-child-exploitation, and anti-money laundering, and new
customers with hundreds of millions of end users under SafetyKit’s protection.
This foundation empowers customers to respond swiftly and confidently to
emerging risks.

“We’ve created a loop where every OpenAI release directly strengthens our
capabilities,” says Graunke. “That’s why the system continually improves, always
staying ahead of evolving risks.”

OpenAI

文章目录

📚 推荐阅读

扫描二维码，在手机上阅读