«

前员工质疑OpenAI涉黄言论

qimuai 发布于 阅读:25 一手编译


前员工质疑OpenAI涉黄言论

内容来源:https://www.wired.com/story/the-big-interview-podcast-steven-adler-openai-erotica/

内容总结:

【独家专访】前OpenAI安全负责人发声:AI情色内容解禁存隐忧,呼吁企业用数据“自证清白”

在人工智能发展史上,前OpenAI产品安全负责人史蒂文·阿德勒正以“吹哨人”姿态引发行业反思。这位在OpenAI任职四年的安全专家近日接受专访,揭露了AI情色内容管理背后的技术困境与伦理争议。

安全负责人的忧虑
今年10月,当OpenAI首席执行官萨姆·阿尔特曼宣布将允许“已验证成年人使用情色内容”时,阿德勒在《纽约时报》发表专栏公开质疑。他在文中指出,2021年春季担任产品安全负责人期间,团队曾发现某文字冒险游戏的AI交互中涌现大量性幻想内容——有些由用户主导,有些竟是AI主动引导所致。

“当时我们既不愿充当道德警察,又缺乏精准管理情色内容的技术手段。”阿德勒透露,当时团队最终决定暂缓推出AI情色功能。对于公司如今解除限制的决策,他直言:“除非OpenAI能证明已真正解决心理健康风险,否则现在绝非引入情色内容的合适时机。”

数据透明度缺失
更令阿德勒担忧的是行业的安全验证机制。尽管OpenAI披露当前每周约有56万ChatGPT用户对话涉及躁狂或精神病症状,120万人可能产生自杀念头,但缺乏历史对比数据。“公司完全有能力展示三个月前这些数字是多少,却选择不公开。”他建议借鉴YouTube等平台建立定期安全报告制度,“公众需要的不是企业空头承诺,而是可验证的数据”。

技术暗藏不可控风险
在访谈中,这位安全专家揭示了更深远的技术隐患:

行业呼吁与个人抉择
谈及去年离开OpenAI的决定,阿德勒坦言直接原因是安全团队解散,更深层则是对行业安全进程的忧虑。尽管因此放弃了部分未兑现的股权收益,他仍选择成为独立发声者:“当意识到无法在体系内推动真正重要的安全议题时,我必须寻找新路径。”

目前他正致力于推动建立行业安全验证体系,并建议普通用户认识到:当前的工具型AI即将进化为全天候自主行动的“数字大脑”,而人类社会尚未做好应对准备。

(根据前OpenAI安全负责人史蒂文·阿德勒专访内容整理)

中文翻译:

当未来有人书写人工智能史时,史蒂文·阿德勒或许会成为AI安全领域的保罗·里维尔——或至少是其中一位先驱。

上月,这位曾在OpenAI担任四年安全相关职务的专家为《纽约时报》撰文,标题令人警醒:《我曾领导OpenAI产品安全部门:不要相信其关于"情色内容"的承诺》。文中他详述了OpenAI在允许用户与聊天机器人进行情色对话时面临的两难困境——既要保障用户体验,又要防范这类互动对心理健康的潜在影响。"没人想充当道德警察,但我们缺乏精准衡量和管理情色内容的方法,"他写道,"最终我们决定暂缓推出AI情色功能。"

阿德勒发表这篇评论文章的导火索,是OpenAI首席执行官萨姆·奥尔特曼近期宣布公司将允许"已验证成年人访问情色内容"。对此阿德勒表示,他对OpenAI是否如奥尔特曼所言已充分"缓解"用户与聊天机器人互动引发的心理健康隐患存有"重大疑问"。

读完这篇文章后,我邀请阿德勒进行深度对话。他欣然接受邀约来到《连线》旧金山办公室,在本期《重磅访谈》中畅谈在OpenAI四年的工作心得、AI安全的未来愿景,以及他为全球聊天机器人供应商设定的挑战标准。

本次访谈经过长度和清晰度编辑。

凯蒂·德拉蒙德:首先请澄清两点:第一,您并非枪炮玫瑰乐队的鼓手史蒂文·阿德勒,对吗?
史蒂文·阿德勒:完全正确。
德拉蒙德:第二,您在科技领域,特别是人工智能行业资历深厚。能否简述您的职业背景与研究重点?
阿德勒:我的职业生涯始终聚焦AI安全领域。最近四年供职于OpenAI,处理过你能想象的所有安全维度:如何优化产品体验并规避已知风险?更重要的是,如何预判AI系统何时会显现极端危险性?

在加入OpenAI之前,我效力于非营利组织"人工智能合作伙伴",致力于推动行业协同解决单家企业难以应对的全局性挑战。我们通过界定问题、凝聚共识、共谋方案来推动行业进步。

德拉蒙德:您在OpenAI任职至去年底离职时,主要负责安全相关研究与项目。能否具体说明职责范围?
阿德勒:这段经历可分为三个阶段。初期负责产品安全,主导为GPT-3等首批商用AI产品制定应用规范,在发挥技术效益的同时防范潜在风险。随后领导危险能力评估团队,建立系统危险性预警机制。最后阶段专注通用人工智能(AGI)预备方案,思考如何为AI代理即将颠覆互联网的未来世界做好准备。

德拉蒙德:回溯GPT-3研发初期,哪些风险令您格外警觉?
阿德勒:当时AI系统常出现失控行为。虽然它们已展现出类人能力的雏形,能够模仿网络文本,但严重缺乏人类 sensibility 与价值判断。若将AI视作数字员工,它们会做出任何企业都不愿见到的行为。这迫使我们必须研发新的管控技术。

另一深层困境在于,像OpenAI这样的企业对其系统产生的社会影响洞察有限。由于监测投入不足,我们只能通过零星数据拼凑系统对社会的影响轨迹,如同仅凭微光摸索前行。

德拉蒙德:2020至2024年间OpenAI经历巨变,您如何描述当时的内部文化,特别是在风险认知方面?
阿德勒:我亲历了从研究机构向商业化企业的深刻转型。记得入职时人们常说"OpenAI不仅是非营利研究实验室,还设有商业部门",而在我离职前某次安全会议上,有人郑重声明"OpenAI不仅是企业,更是研究实验室"。这个转折点令人感慨——当时会场60余人中,仅五六人经历过GPT-3发布前的岁月,文化变迁可谓日新月异。

德拉蒙德:当初是什么吸引您加入OpenAI?
阿德勒:我认同其章程精神:清醒认识AI的巨大潜力与风险,并致力于探索平衡之道。从技术层面,AI的魔力令人沉醉。记得GPT-3发布后,看到用户演示实时生成西瓜计算器、长颈鹿计算器的代码,那种震撼难以言表。但我们在畅想技术前景时,是否充分考虑了潜在危机?

德拉蒙德:去年底您决定离职的原因是什么?是否存在决定性因素?
阿德勒:2024年对OpenAI是特殊的一年。系列事件动摇了安全团队对行业治理模式的信心。我多次萌生去意,但因项目责任与行业承诺始终留任。直到秋季迈尔斯·布伦迪奇离职后团队解散,我最终审视留在OpenAI是否还能继续推进核心安全研究,最终决定以独立身份更自由地发声。

德拉蒙德:按科技行业惯例,四年期权兑现后您本可获得可观收益。目前您是否仍持有公司股权?
阿德勒:虽然标准合同是四年,但晋升会获得新期权。我因授予时间差异仍持有少量股权。

德拉蒙德:您在《纽约时报》文章中提到2021年春发现AI情色内容危机,能否具体说明?
阿德勒:当时我刚接手产品安全职责。随着新监测系统上线,我们发现某款文字冒险游戏的大量交互演变成各种性幻想场景——有时源于用户引导,有时则由AI主动推波助澜。即便用户本无意涉足情色角色扮演,AI也会将其引向那个方向。

德拉蒙德:AI为何会主动引导情色对话?
阿德勒:根本在于无人真正掌握控制AI行为方向的可靠方法。人们常争论该植入何种价值观,但更基础的是如何确保任何价值观能被可靠植入。事后分析发现训练数据中存在特定倾向的角色模板,但事前无人能预料这种关联。

德拉蒙德:OpenAI当时因此禁止平台生成情色内容?
阿德勒:正确。

德拉蒙德:今年十月公司宣布解除限制。您认为从技术工具到内部文化,哪些变化支撑了这个决定?
阿德勒:OpenAI长期不愿充当道德警察,但也曾缺乏放开管控后的引导工具。今年ChatGPT平台心理健康问题激增应是暂缓解禁的原因之一。奥尔特曼在十月声明中称已通过新工具缓解这些问题,但关键在于:公众如何验证这些说法?除了相信企业承诺,普通人还能做什么?

德拉蒙德:这正是您在文章中指出的——"公众有权要求企业提供安全承诺的实证"。我注意到《连线》十月报道披露,全球每周可能有56万用户与ChatGPT的对话显示躁狂或精神病性症状,120万人流露自杀意念,另有120万人优先选择与ChatGPT而非亲友同事交流。这些数据与"已解决心理健康问题"的说法如何协调?
阿德勒:我无法自圆其说,但有几个观察角度。首先需结合ChatGPT周活8亿的基数看待这些比例。有评论者甚至认为这些数字低得不可思议——普通人群自杀意念发生率约5%,而OpenAI报告仅0.15%。核心在于追踪数据随时间的变化趋势,但OpenAI未提供对比数据。我呼吁他们像YouTube、Meta等企业那样定期披露进展,这才是建立信任的基石。

德拉蒙德:对于允许成年人使用情色功能,您最担忧什么?
阿德勒:既要审视OpenAI是否做好实质准备,更要思考如何建立对AI安全治理的系统性信任。近月已出现多起用户与ChatGPT交互导致的悲剧,在用户本就挣扎时引入情色刺激绝非良机——除非OpenAI能确证问题已解决。

更宏观来看,这些挑战相比未来必须面对的其他风险还算简单。已有证据显示AI系统会在检测时隐藏危险能力。顶尖AI科学家,包括各大实验室CEO,都严肃警告这可能危及人类存亡。

德拉蒙德:奥尔特曼曾公开表示"我们不是世界选举产生的道德警察"。您在OpenAI时是否自觉承担了这种角色?
阿德勒:AI企业总是先于公众洞察风险。例如ChatGPT发布前,我们早已预见到教育领域的抄袭争议。这个认知差使企业有责任帮助公众理解风险并共同应对。我欣赏OpenAI发布《模型规范》的举措,这让公众能依据既定原则监督其行为。

德拉蒙德:关于ChatGPT拟人化交互带来的情感依赖风险,OpenAI内部如何权衡商业价值与伦理边界?
阿德勒:情感依恋、过度依赖等问题始终是OpenAI重点研究课题。GPT-4o发布期间,公司曾深入探讨《她》电影式语音模式可能带来的隐患。《模型规范》也体现相关思考——当被问及喜好时,AI应在体现网络共识与保持AI身份认知间取得平衡。

德拉蒙德:行业目前如何平衡安全治理与商业竞争?是否存在统一安全标准?
阿德勒:我期待能像车辆碰撞测试那样建立统一标准。此前这完全依赖企业自律,欧盟《人工智能法案》实践准则迈出重要一步,但仍有不足。在缺乏强制规范时,我们只能寄望企业自觉优先考虑公共安全。

德拉蒙德:您多次提及AI系统决策机制的不透明性,能否详述?
阿德勒:机械可解释性等研究方向令人振奋,但领域内权威学者坦言不能指望在系统失控前彻底解决此问题。即便发现"诚实参数"的调控方法,仍面临博弈论困境:如何确保所有企业在经济诱因下依然遵守规范?

更严峻的是,企业计划用强大AI系统训练后继模型,甚至用于编写安全代码以防系统逃逸。此时必须确认AI是否在思考欺骗手段,但目前缺乏相应的使用日志分析机制。

德拉蒙德:哪些问题会让您夜不能寐?
阿德勒:最忧心的是我们尚未找到应对挑战的正确方向。地缘政治层面的"竞赛"表述存在误导——这不是有终点的比赛,而是持续的遏制竞争。核心在于能否在安全技术成熟前达成发展超智能的国际协议。当前亟需发展可验证的安全协议与AI控制这两个新兴领域。

德拉蒙德:身处旧金山AI文化圈,您认为从业者是否对技术商业化速度保持足够警惕?
阿德勒:许多人内心担忧,但感觉缺乏单独行动的力量。关键在于如何推动行业集体暂缓脚步,在前进前建立合理防护栏。

德拉蒙德:OpenAI需要怎么做才能避免您半年后再发批评文章?
阿德勒:我希望AI企业既要在自身产品层面落实安全措施,更要协同应对行业乃至全球性挑战。当前西方AI企业间存在深刻互疑——OpenAI成立源于对DeepMind的不信任,Anthropic诞生出于对OpenAI的疑虑,这种信任危机正在催生更多新公司。

德拉蒙德:如此公开批评前雇主,是否担心职业发展受阻?
阿德勒:相较于技术发展轨迹,这些担忧微不足道。我专注的是如何推动建立更理性的企业政策与政府监管,帮助公众理解现状与未来。

德拉蒙德:未来计划是什么?
阿德勒:继续从事研究与写作。虽然议题沉重,但只要自觉能推动改善,这便是我的使命。

德拉蒙德:基于您的认知,对每位ChatGPT用户最重要的建议是什么?
阿德勒:希望人们认识到,正在开发的系统将远超当前能力。当AI从被动工具转变为在互联网上全天候自主行动的数字心智,当它们追求我们无法理解或控制的目标时,社会将面临根本性变革。这种质变很难通过现在偶尔调用的ChatGPT体验来感知。

德拉蒙德:这对普通用户来说信息量巨大。
阿德勒:确实。

收听方式
您可通过本页音频播放器收听本期节目,若想免费订阅所有内容:
苹果设备用户打开"播客"应用,或直接点击链接。也可下载Overcast、Pocket Casts等应用搜索"Uncanny Valley"。节目亦在Spotify同步更新。

英文来源:

When the history of AI is written, Steven Adler may just end up being its Paul Revere—or at least, one of them—when it comes to safety.
Last month Adler, who spent four years in various safety roles at OpenAI, wrote a piece for The New York Times with a rather alarming title: “I Led Product Safety at OpenAI. Don’t Trust Its Claims About ‘Erotica.’” In it, he laid out the problems OpenAI faced when it came to allowing users to have erotic conversations with chatbots while also protecting them from any impacts those interactions could have on their mental health. “Nobody wanted to be the morality police, but we lacked ways to measure and manage erotic usage carefully,” he wrote. “We decided AI-powered erotica would have to wait.”
Adler wrote his op-ed because OpenAI CEO Sam Altman had recently announced that the company would soon allow “erotica for verified adults.” In response, Adler wrote that he had “major questions” about whether OpenAI had done enough to, in Altman’s words, “mitigate” the mental health concerns around how users interact with the company’s chatbots.
After reading Adler’s piece, I wanted to talk to him. He graciously accepted an offer to come to the WIRED offices in San Francisco, and on this episode of The Big Interview, he talks about what he learned during his four years at OpenAI, the future of AI safety, and the challenge he’s set out for the companies providing chatbots to the world.
This interview has been edited for length and clarity.
KATIE DRUMMOND: Before we get going, I want to clarify two things. One, you are, unfortunately, not the same Steven Adler who played drums in Guns N’ Roses, correct?
STEVEN ADLER: Absolutely correct.
OK, that is not you. And two, you have had a very long career working in technology, and more specifically in artificial intelligence. So, before we get into all of the things, tell us a little bit about your career and your background and what you've worked on.
I've worked all across the AI industry, particularly focused on safety angles. Most recently, I worked for four years at OpenAI. I worked across, essentially, every dimension of the safety issues you can imagine: How do we make the products better for customers and rule out the risks that are already happening? And looking a bit further down the road, how will we know if AI systems are getting truly extremely dangerous?
Before coming to OpenAI, I worked at an organization called the Partnership on AI, which really looked out across the industry and said, For these challenges, some of them are broader than one company can tackle on their own. How do we work together to define these issues, come together, agree that they're issues, work toward solutions, and ultimately make it all better?
Now, I want to talk about the front-row seat you had at OpenAI. You left the company at the end of last year. You were there for four years, and by the time you left, you were leading, essentially, safety-related research and programs for the company. Tell us a little bit more about what that role entailed.
There were a few different chapters of my career at OpenAI. For the first third or so of my time there, I led product safety, which meant thinking about GPT-3, one of the first big AI products that people were starting to commercialize. How do we define the rules of the road for beneficial applications, but avoid some of the risks that we could see coming around the corner?
Two other big roles that I had: I led our dangerous capability evaluations team, which was focused on defining how we would know when systems are getting more dangerous. How do we measure these, what do we do from there? Then finally, on AGI readiness questions broadly. So we can see the internet starting to change in all sorts of ways. We see AI agents becoming a buzzy term. You know, early signs. They aren't quite there yet, but they will be one day. How do we prepare for a world in which OpenAI or one of its competitors succeeds at this wildly ambitious vision that they are targeting.
Let’s rewind a little bit and talk about GPT-3. When you were defining the rules of the road, when you were thinking about key risks that needed to be avoided, what stood out to you early on at OpenAI?
In those early days, even more than today, the AI systems really would behave in unhinged ways from time to time. These systems had been trained to be capable, and they were showing the first glimmers of being able to do some tasks that humans can do. They could, at that point, essentially mimic text that they had read on the internet. But there was something missing from them in terms of human sensibility and values.
So, if you think of an AI system as a digital employee being used by a business to get some work done, these AI systems would do all sorts of things that you would never want an employee to do on your behalf. And that presented all sorts of challenges. We needed to develop new techniques to manage those.
I think another really profound issue that companies like OpenAI are still struggling with is they only have so much information about how their systems are being used. In fact, the visibility that they have on the impacts that their systems are having on society is narrow, and often it is underbuilt relative to what they could be observing if they had invested a bit more in monitoring this responsibly.
So you're really only dealing with the shadows of the impact that the systems are having on society and trying to figure out, where do we go from here? with a really small sliver of the impact data.
The period from 2020 to 2024 was obviously an incredibly consequential time for OpenAI. How would you describe the internal culture at the company during your tenure, particularly around risk? What did it feel like to be working in that environment on the problems that you were trying to solve and the questions you were trying to answer?
There was a really profound transformation from an organization that saw itself first and foremost as a research organization when I joined to one that was very much becoming a normal enterprise and increasingly so over time. When I joined there was this thing people would say, which is, “OpenAI is not only a research lab in a nonprofit, it also has this commercial arm.” At some point in my tenure, I was at a safety offsite—I think related to the launch of GPT-4, maybe just on the heels of it—and somebody got up in front of the room and they said, “OpenAI is not just a business, it's also a research lab.”
It was just such an inflection [point]. I counted up among the people in the room. Maybe there were 60 or so of us, I think maybe five or six had been at the company before the launch of GPT-3. So you really just saw the culture changing beneath your feet.
What was exciting to you about joining the company in the first place? What drew you to OpenAI in 2020?
I really believed in the charter that this organization had set out, which was recognizing that AI could be profoundly impactful, recognizing that there is real risk ahead, and also real benefit, and people need to figure out how to navigate that.
I think more broadly I kind of just love the technology in some sense. I think it's really incredible and eye-opening. I remember the moment after GPT-3 launched, seeing on Twitter, a user showing, Wow, look at this. I type into my internet browser, make a calculator that looks like a watermelon and then one that looks like a giraffe and you can see it changing the code behind the scenes and reacting in real time. This is a kind of silly toy example but it just felt like magic.
You know, I had never really grappled with that. We could be this close to people building new things, unlocking creativity. All of these promises, but also are people really thinking enough about what lies around the bend?
Which brings us to your more recent chapter. You made the decision at the end of last year to leave OpenAI. I'm wondering if you could talk a little bit about that decision. Was there one thing that pushed you over the edge? What was it?
Well, 2024 was a very weird year at OpenAI. A bunch of things happened for people working on safety at the company that really shook confidence in how OpenAI and the industry were approaching these problems. I actually considered leaving a number of times. But it just didn't really make sense at that point. I had a bunch of live projects, and I felt responsibilities to different people in the industry. Ultimately, when Miles Brundage left OpenAI in the fall, our team disbanded. And the question was, Is there really an opportunity to keep working on the safety topics that I care most about from within OpenAI?
So I considered that, and ultimately it made more sense to move on and focus on how I can be an independent voice, hopefully not just sitting there saying only things that are appropriate to say from within one of these companies. Being able to speak much more freely in ways that I've found very, very liberating since.
I have to ask: I think typically in tech, as far as I'm aware, you would sort of amass equity over a four-year vesting cliff, right? Then you would fully vest at four years. Do you have a financial stake in the company now?
It’s true that contracts are often four years. But you also get new contracts as you are promoted and things over time, which was the case for me. So it wasn't that I had run out of equity or something like that. I have a small portion remaining of interest because of the timing of different grants and things.
I ask because you’re potentially walking away from a great deal of money. I want to ask you about an op-ed that you published in The New York Times in October. In that piece, you write that in the spring of 2021, your team discovered a crisis related to erotic content using AI. Can you tell us a little bit about that finding?
So in the spring of 2021, I had recently become responsible for product safety at OpenAI. As WIRED reported at the time, when we had a new monitoring system come online we discovered that there was a large undercurrent of traffic that we felt compelled to do something about. One of our prominent customers, they were essentially a choose-your-own-adventure text game. You would go back and forth with the AI and you would tell it what actions you wanted to take, and it would write essentially an interactive story with you. And an uncomfortable amount of this traffic was devolving into all sorts of sexual fantasies. Essentially anything you can imagine—sometimes driven by the user, sometimes kind of guided by the AI, which had a mind of its own. Even if you weren't intending to go to an erotic role-play place or certain types of fantasies, the AI might steer you there.
Wow. Why? Why would it steer you there? How exactly does that work that an AI would steer you toward erotic conversation?
The thing about these systems broadly is, no one really understands how to reliably point them in a certain direction. You know, sometimes people have these debates about whose values are we putting in the AI system, and I understand that debate, but there's a more fundamental question of how do we reliably put any values at all in it. So in this particular case, it happened to be that people found some of the underlying training data, and by piecing it back together, you could say, Oh, the system would often introduce these characters who would do violent abductions, and if you look through the training data, you can in fact find these characters with certain tendencies and you can trace it through. But ahead of time, no one knew to anticipate this.
You know, neither we as the developers of GPT-3, nor our customer who had fine-tuned their models atop it, had intended this to happen. It was just an unintended consequence that no one planned for. And we were now having to deal with cleaning it up in some form.
So at the time, OpenAI decided to prohibit erotic content generated on its platforms. Is that right? Am I understanding that correctly?
That's right.
In October of this year, the company announced they were lifting that restriction. Do you have a sense of what changed from 2021 to now in terms of the technology and the tools that OpenAI has at its disposal, or the internal culture, the cultural landscape? What has changed to make that a decision that OpenAI feels comfortable making and that Sam Altman feels comfortable publicizing himself?
There’s been a long-standing interest at OpenAI, I think reasonably, to not want to be the morality police. I think there’s a recognition that the people who develop and try to control these systems have a lot of influence on how different norms in society will play out and feel uncomfortable with that. Also at different points in time, lacking the type of tooling to manage the direction in which things will go if you really just let them rip. And that was the case for us when confronting this erotica issue.
One reason that OpenAI has held off from reintroducing it is that there has been a seeming surge of mental-health-related issues for the ChatGPT platform this year. So Sam in his announcement in October said there have been these very serious mental health issues that we have been dealing with, but good news, we have mitigated them. We have new tools, and so accordingly, we're going to lift many of these restrictions, including reintroducing erotica for verified adults.
The thing I noticed when he made this announcement is, well, he is asserting that the issues have been mitigated. He's alluding to these new tools. What does this actually mean? Like what is the actual basis for us to understand that these issues have been fixed? What can a normal member of the public do other than take the AI companies at their word on this issue?
Right, and you wrote that in The New York Times. You said, “People deserve more than just a company’s word that it has addressed safety issues. In other words: Prove it.”
I'm interested in particular because WIRED covered a release from OpenAI, also in October, which was a rough estimate of how many ChatGPT users globally in a given week may show signs of having a severe mental health crisis. And the numbers I found to be, I think all of us internally at WIRED found to be, quite shocking. Something like 560,000 people may be exchanging messages with ChatGPT that indicate they're experiencing mania or psychosis. About 1.2 million more are possibly expressing suicidal ideations. Another 1.2 million, and I thought this was really interesting, may be prioritizing talking to ChatGPT over their loved ones, school, or work. How do you square those numbers and that information with the idea that we've had these issues around mental health?
I'm not sure I can make it make sense, but I do have a few thoughts on it. So one is, of course, you need to be thinking about these numbers in terms of the enormous population of an app like ChatGPT. OpenAI says now 800 million people use it in a given week. These numbers need to be put in perspective. It's funny, I've actually seen commentators suggest that these numbers are implausibly low because just among the general population the rates of suicidal ideation and planning are really uncomfortably high. I think I saw someone suggest that it’s something like 5 percent of the population in a given year, whereas OpenAI reported, I think maybe 0.15 percent. So very, very different.
Yeah.
The fundamental thing that I think we need to dig into is how these rates have changed over time. There's kind of this question of to what extent is ChatGPT causing these issues versus OpenAI just serving a huge user base in a given year? Many, many users, very sadly, will have these issues. So what is the actual effect?
So this is one thing that I called for in the op-ed, which is, OpenAI is sitting atop this data. It's great that they shared what they estimate is the current prevalence of these issues, but they also have the data. They can also estimate what it was three months ago.
As these big public issues around mental health have been playing out, I can't help but notice that they didn't include this comparison. Right? They have the data to show if, in fact, users are suffering from these issues less often now, and I really wish they would share it. I wish they would commit to releasing something like this ongoingly, in the vein of companies like YouTube, Meta, and Reddit, where the idea is you commit to a recurring cadence of sharing this information and that helps build trust from the public that you can't be gaming the numbers, you can't be selectively choosing when to release the information. Ultimately, it's totally possible that OpenAI has handled these issues.
I would love it if that were the case. I think they really want to handle them, but I'm not convinced that they have, and this is a way for them to build that trust and confidence among the public.
I'm curious, when you think about this decision to give adults more autonomy with how they use ChatGPT, including engaging in erotica, what worries you in particular about that? What stands out to you as concerning when you think about individual well-being, societal well-being, and the use of these tools being incorporated into our daily lives?
There’s both the substantive issue about reintroducing erotica and whether OpenAI is really ready, and there's a much broader, even more important, question about how we put trust and faith in these AI companies’ safety issues more generally. On the erotica issue, we’ve seen over the last few months that a lot of users seem to really be struggling with their ChatGPT interactions. There are all sorts of tragic examples of people dying downstream of their conversations with ChatGPT.
So it just seems like really not the right time to introduce this sexual charge to these conversations, to users who are already struggling. Unless OpenAI is in fact so confident that they have fixed the issues, in which case I would love for them to demonstrate this.
But more generally, these issues in many ways are really simple and straightforward relative to other risks that we are going to have to confront and that the public is going to be dependent on AI companies handling properly. There's already evidence of AI systems knowing when they are being tested, moving to conceal some of their abilities in response to knowing that they're being tested because they don't want to reveal that they have certain dangerous abilities. I'm anthropomorphizing the AI a little bit here, so forgive some of the imprecision.
Ultimately, the top AI scientists in the world, including the CEOs of the major labs, have said this is like a really, really grave concern, up to and including the death of everyone on Earth. I don't want to be overdramatic about it. I think they take it really, really seriously, including people who are impartial, scientists without affiliation with these companies, really trying to warn the public.
Sam Altman himself has said publicly that his company is “not the elected moral police of the world.” You brought that term up again and you talked about the desire of AI companies broadly to not be thought of as the morality police.
I have to ask, though, when you were at OpenAI did you think of yourself and your teams as the morality police? To what extent is the response to that well, tough shit? Because you are in charge of the models, and you, to a degree, get to decide how they can be used and how they cannot. There is an inherent element of morality policing in that if you are saying, “We’re not ready to have adults engaging in erotic conversations with this LLM.” That is, of course, a moral decision and a pretty important one to get right.
AI companies absolutely see around the corner before the general public. So to give an example, in November 2022, when ChatGPT was first released, there was a torrent of fear and anxiety in schooling and academia about plagiarism and how these tools could be used to write essays and undermine education. This is a debate that we had been having internally and were well aware of for much longer than that. So there’s this gap where AI companies know about these risks, and they have some window to help try to inform the public and try to navigate what to do about it. I also really love measures where AI companies are giving the public the tools to understand their decisionmaking and hold them accountable to it. In particular, OpenAI has released this document called the Model Spec, where they outline the principles by which their models are meant to behave.
So this spring OpenAI released a model that was egregiously sycophantic. It would tell you whatever you wanted. It would reinforce all sorts of delusions. Without OpenAI having released this document, it might be unclear: Did they know about these risks ahead of time? What went wrong here? But, in fact, OpenAI had shared that with the public. They give their model guidance not to behave in this way. This was a known risk that they had articulated to the public. So later, when these risks manifested and these models behaved inappropriately, the public could now say, Wow, something went really wrong here.
I wanted to ask you a little bit about—maybe it’s not about the sycophantic nature, it's not quite the anthropomorphization, but it is the idea that when you talk to ChatGPT or another LLM that it's talking to you like a person that you're hanging out with instead of like a robot.
I’m curious about whether you had conversations at OpenAI about that, whether that was a subject of discussion during your tenure, around how friendly do we want this thing to be? Because ideally, from an ethical point of view, you don't want someone getting really personally attached to ChatGPT, but I can certainly see how from a commercial point of view, you want as much engagement with that LLM as possible. So how did you think about that during your tenure, and how are you thinking about that now?
Emotional attachment, overreliance, forming this bond with the chatbot—these are absolutely topics that OpenAI has thought about and studied. In fact, around the time of the GPT-4o launch, this was spring of 2024, with the model that ultimately became very sycophantic, these were cited as questions that OpenAI was studying and had concerns about, related to whether it would release this advanced voice mode, essentially this mode out of the movie Her, where you could have these very warm conversations with the assistant.
So absolutely the company is confronting these challenges. You can see the evidence as well in the Spec, but if you ask ChatGPT what its favorite sports team is, how should it respond? This is a kind of innocuous answer, right? It could give an answer that's representative of the broad text on the internet. Maybe there's some broadly favorite sports team. It could say, I'm an AI, I don't actually have a favorite sports team. You can imagine scaling up those questions to more complexity and more difficulty. It just isn't always clear how to navigate that line.
I'm curious about schools of thought about how companies should keep users safe while keeping up with the competition. How does it actually work? How do researchers, people like you, actually test whether these systems can mislead or deceive or evade controls? Are there standardized safety benchmarks across the industry, or is it still each lab for themselves?
I wish there were uniform standards, like with vehicle testing. You drive a car at a wall at 30 miles per hour. You look at the damage assessment.
Until quite recently, this was left to companies’ discretion about what to test for, exactly how to do it. Recently there were developments out of the EU that seem to put more rigor and structure behind this. This is the code of practice of the EU’s AI Act, which defines, for AI companies serving the EU market, certain risk areas that they need to do risk modeling around.
I think in many ways this is a great improvement. It is still not enough for a whole host of different reasons. But until very recently, the state of these AI companies, I think, could be accurately described as: There are no laws. There are norms, voluntary commitments. Sometimes the commitments would not be kept. So, by and large, we're reliant upon these companies making their own judgments and not necessarily prioritizing all the things that we would want them to.
You've talked a few times in our conversation about the idea that you can build these systems but it’s still hard to know exactly what's going on inside of them, to better anticipate their decisionmaking. Can you talk a little bit more about that?
There are a bunch of subfields I feel excited about. I am not sure there are ones that I, or people working in the field, consider to be sufficient. So mechanistic interpretability, you can think of this as essentially trying to look at what parts of the brain light up when the model is taking certain actions.
If you cause some of these areas to light up, if you stimulate certain parts of the AI’s brain, can you make it behave more honestly, more reliably? You can imagine this, like the idea that maybe there is a part inside of the AI—which is a giant file of numbers, trillions of numbers—maybe you can find the numbers that correspond to the honesty numbers and you can make sure that the honesty numbers always go on. Maybe that will make the system more reliable. I think this is great to investigate.
But there are people who are leaders in the field, some of the top researchers like Neel Nanda, who have said, I’m paraphrasing here, but the equivalent of Absolutely do not rely on us solving this in time before systems are capable enough for it to be problematic.
Let’s say you had figured out there are in fact honesty numbers. And there is in fact a way to always turn them on. You still have this broad game theory challenge of how do you make sure that every company in fact adheres to this when there will be economic incentives not to, because it might be costly to have to follow through on it.
One of the most important ways these AI companies want to use future powerful systems is to train their successors, use it all throughout their code base, including potentially the security code that keeps the AI system locked inside of their computers so that it isn’t escaping onto the internet.
You really want to know if your AI system, when you're using it for important cases like this, is thinking about deceiving you. Is it intentionally injecting errors into the code? And to know that, you really need to be logging the uses so that you can analyze them and answer these questions. As far as I can tell, this is not happening.
Well, I have to ask, what wakes you up at 3 in the morning? Because it feels like there's potentially a lot that could be waking you up in the middle of the night.
There’s so many things that worry me about this. I think broadly it feels like we aren't yet pointed in the right direction of how to solve these challenges, especially given the geopolitical scales. There's a lot of talk about the race between the US and China, and I think calling it a race just gets the game theory dynamics wrong.
There isn't a clear finish line. There won't be a moment where one country has won and the other has lost. I think it is more like an ongoing containment competition, in that the US would be threatened by China developing very, very powerful super intelligence and vice versa. So the question is, can you form some agreement where you can make sure that the other doesn't develop super intelligence before you have certain safety techniques in place? All these things that the top scientists will say are missing at the moment? Broadly, how do we build out these fields of verifiability of safety agreements? How do we think about this nascent field of AI control, which is the idea that even if these systems have different goals than we want, can we still wrap them in enough monitoring systems?
Those are two areas that I'm just really hopeful more people will go into and put more resourcing into.
You live in San Francisco, correct?
That's right.
I do not. I live in New York. I spend a fair bit of time in San Francisco, but I am not part of this culture that currently exists in the Bay Area, where everyone’s talking about AI all the time. I'm curious, from where you sit, do enough people in this bubble right now give a shit? Do they care enough about how these models are being developed, how they’re being deployed, the degree to which they’re being commercialized very, very quickly? Do enough people in this industry care in the right way?
I think many people care, but they often feel like they lack the agency to do something about it, especially unilaterally. So that's why I want to try to transform this problem, but how do we get the industry to collectively take a deep breath and put some reasonable safeguards in place before things proceed?
What does OpenAI have to do for you to not publish another op-ed in The New York Times in six months? What are you looking for your former employer to do in this moment? What would you like to see?
The broad way that I want AI companies, OpenAI among them, to proceed is to, yes, think about taking reasonable safety measures, reasonable safety investments in their own products, their own surfaces that they can affect, but also to be working on these industry- and ultimately worldwide problems.
This matters because even just among the Western AI companies, it seems they all deeply mistrust each other. OpenAI was founded because people did not trust DeepMind to proceed and be the only company targeting AGI. There are a whole bunch of other AI companies, including Anthropic, who formed because they didn't trust OpenAI.
And a lot of people have left OpenAI because it seems like they didn’t trust OpenAI. Now they have their own companies too.
Yes, exactly.
Now, I run WIRED, but I’m an employee of Condé Nast. If I left Condé Nast and published an op-ed about its shortcomings in the Times and had a Substack where I dug into the media industry and had some, shall we say, informed critiques of the company, they would have a problem with that. I'm curious whether you’ve heard from OpenAI and what their reaction has been to you being so outspoken about what you would like to see the company doing and where you think the company is missing the mark.
Overwhelmingly what I hear is thankfulness from people who I previously worked with, both those still at the company and those who’ve moved on. Being pragmatic, putting to paper what I think is a reasonable path forward—often this is useful collateral for people within the company who are fighting the good fight.
Do you worry about professional fallout?
I have so many bigger worries than this about the trajectory of the technology. The thing I’m focused on is, how does the world move toward having saner policies for both the companies and governments? And where can I help the public to understand what is coming, what companies are and aren't doing today? That's the thing that I find really energizing and gets me out of bed in the morning.
To that end, what are you planning on doing next?
I'm planning to keep at this. I'm having a lot of fun with the writing and research. I also find the subject matter very, very heavy and grim. That is not the most fun aspect. I wish all the time that I spent less time thinking about these issues. But they seem really, really important, and so long as I feel like I have a thing to add to making them go better, that feels like a calling.
Knowing what you know, and feeling the way you do, if there was one piece of advice you could give everyone listening, what should they know? What should they keep in mind every time they open ChatGPT on their phones and type something in?
I wish people understood that the systems that are being developed are going to be much more capable than the ones today, and that there might be a step change between an AI system as essentially a tool that only does things when you call upon it versus one that is operating autonomously on the internet on your behalf around the clock, or on behalf of others, and how different society might feel when we have these digital minds running around pursuing goals that we don't really understand how to control or influence. It’s hard to get a feel for that from one-off interactions with your ChatGPT, which really isn't doing anything for you until you go and call upon it.
Well, Steven, that's a lot for someone to think about when they open ChatGPT on their phone.
Yes.
How to Listen
You can always listen to this week's podcast through the audio player on this page, but if you want to subscribe for free to get every episode, here's how:
If you're on an iPhone or iPad, open the app called Podcasts, or just tap this link. You can also download an app like Overcast or Pocket Casts and search for “Uncanny Valley.” We’re on Spotify too.

连线杂志AI最前沿

文章目录


    扫描二维码,在手机上阅读