«

使用伪造人脸进行AI训练能否更符合伦理?

qimuai 发布于 阅读:3 一手编译


使用伪造人脸进行AI训练能否更符合伦理?

内容来源:https://www.sciencenews.org/article/fake-faces-ai-training-ethical

内容总结:

【AI面部识别技术迈向伦理新阶段:合成人脸数据或成隐私与公平性双赢关键】

长期以来,AI面部识别系统因存在针对特定人群的歧视性误差而备受争议。早期技术对白人男性识别准确率极高,而对其他族裔和性别群体的误差率甚至高出百倍,导致从手机解锁失败到错误逮捕等一系列严重后果。

近年来,通过优化数据集平衡、提升算力及改进算法损失函数,面部识别准确率显著提高。密歇根州立大学计算机科学家刘小明指出,当前最佳算法在肤色、年龄和性别维度上已实现接近99.9%的精准度。然而,高精度背后隐藏着隐私代价——企业与研究机构常未经许可从互联网抓取数亿张真实人脸数据用于训练模型,引发数据盗用、身份盗窃和监控越权等伦理问题。

为破解隐私困局,合成人脸数据方案应运而生。这类计算机生成的图像不对应任何真实个体,虽目前训练出的模型精度仍低于真实数据,但研究者认为,随着生成式AI技术进步,合成数据有望在保障隐私的同时维持跨群体公平性与准确性。瑞士Idiap研究所的Ketan Kotwal强调:“每个人无论肤色、性别或年龄,都应享有平等的被识别权。”

技术实现上,现代面部识别依赖卷积神经网络(CNN)提取人脸特征并生成数字模板。但早期训练数据集中白人男性占比过高,导致算法对其他群体识别能力薄弱。2018年一项研究显示,商用算法对深肤色人群误差率极高,甚至将米歇尔·奥巴马和奥普拉·温弗瑞误判为男性。2019年美国国家标准与技术研究院(NIST)进一步证实,对亚裔和非裔面孔的识别准确率仅为白人的十分之一至百分之一。

合成数据的应用分为两步:首先生成虚构人脸,随后创建不同角度、光照及装饰的变体。尽管生成器仍需少量真实图像训练,但其需求远低于直接训练识别模型所需的百万级数据。最新研究表明,基于合成数据训练的模型虽平均准确率暂低于真实数据(75% vs 85%),但跨种族识别一致性显著提升,偏差率降至传统模型的三分之一。

当前挑战在于生成器能创造的身份数量有限,且生成的“完美”图像缺乏现实场景的复杂性(如阴影遮挡)。下一步,研究者计划探索混合方案:先用合成数据训练模型学习跨群体特征,再通过经授权的真实数据微调精度。

尽管学术界致力于提升技术公平性,民权组织仍警告:过度精准的面部识别可能助长全天候追踪。对此,研究者认为,高精度系统虽存风险,但低精度系统误判带来的危害更需警惕。合成数据技术的快速发展,或将为AI伦理与效能的平衡提供新路径。

中文翻译:

用虚拟面孔训练AI,能否更符合伦理?
合成图像有望为隐私保护与公平性兼顾的人脸识别训练提供新路径
作者:Celina Zhao

人工智能长期以来一直存在系统性缺陷,会歧视特定人口群体。其中人脸识别技术曾是重灾区。对白人男性而言,其识别准确率极高;但对其他群体,错误率可能高出百倍。这种偏见会导致严重后果——从无法解锁手机,到因错误匹配而遭误捕。

过去几年间,这种准确率差距已急剧缩小。"在近距离范围内,人脸识别系统几乎完全精准,"密歇根州立大学计算机科学家刘晓明表示。如今最佳算法在不同肤色、年龄和性别群体中均可达到近99.9%的准确率。

但高准确率伴随着巨大代价:个人隐私。企业与研究机构未经许可就从互联网抓取数百万人脸数据训练模型,不仅构成数据窃取,更可能为身份盗用或监控越界打开方便之门。

为解决隐私问题,一项惊人提案正获得关注:使用合成人脸训练算法。这些计算机生成的图像逼真却不对应真实个体。该方法尚处早期阶段,基于"深度伪造"训练的模型准确率仍低于真实人脸训练。但研究者乐观认为,随着生成式AI工具进步,合成数据既能保护隐私,又能保持跨群体的公平性与准确性。

"每个人无论肤色、性别或年龄,都应享有平等的被正确识别机会,"瑞士伊迪亚普研究所的计算机科学家科特瓦尔强调。

人工智能识别人脸的技术原理
先进人脸识别技术于2010年代成为可能,这归功于称为卷积神经网络的新型深度学习架构。CNN通过多层数序运算处理图像:底层识别边缘曲线等简单模式,高层组合这些特征形成眼鼻口等复杂形态。

现代系统中,人脸先被检测定位,经旋转居中及尺寸标准化后,CNN会提取面部特征并将其压缩为数字向量(称为模板)。刘晓明解释说:"这个包含数百个数字的模板本质上就是你的社会安全码。"

为实现这一过程,CNN需预先通过数百万张同一人在不同光线、角度、距离或配饰条件下的照片进行训练,并通过身份标签学习将同一人的模板在数学空间中聚集、异者疏离。这种表征构成了两类主流算法的基石:其一是"一对一"验证(如手机解锁或出入境检查),其二是"一对多"辨识(如数据库身份匹配)。

但研究者很快发现这些算法并非对所有人都同样有效。

公平性难题的根源
2018年首项重磅研究揭示:商业人脸分类算法中,肤色越深错误率越高。微软系统曾将米歇尔·奥巴马误判为男性,亚马逊系统也对奥普拉·温弗瑞出现同样错误。

虽然分类(判定性别等属性)与识别(确认身份)存在差异,但核心挑战一致——算法都必须解析面部特征。某些群体更高的失误率暗示着算法偏见。2019年美国国家标准技术研究院进一步证实:在评估近200种商业算法后,发现对亚裔和黑人面孔的识别准确率仅为白人的十分之一至百分之一,部分算法对黑人女性误报率更高。

这些错误导致至少八起误捕案件,其中七起涉及黑人男性。纽约大学计算机科学家贾因指出,人脸识别模型的偏见"本质是数据问题"——早期训练数据集中白人男性占比过高,导致模型更擅长区分该群体。

如今,通过平衡数据集、提升算力及优化损失函数等技术手段,人脸识别已趋近完美。NIST的月度测试显示,自2018年以来错误率下降超90%,几乎所有算法在受控环境下均达到99%以上准确率。刘晓明认为:"当整体性能达到99.9%时,不同群体间几乎无差异,因为每个人口群体都能被准确分类。"但这背后潜藏着新的问题。

虚拟面孔能否解决隐私隐忧?
2018年研究引发争议后,IBM发布了包含超百万张标注种族性别属性图像的"面部多样性"数据集,试图弥补算法缺乏平衡性的缺陷。但这些图片未经济许可就从Flickr抓取,引发强烈反弹。执法部门常用的Clearview AI同样被曝从社交平台非法收集600亿张图像。

这些行为引发关于伦理收集数据的辩论。贾因警告生物特征数据库存在巨大隐私风险:"这些图像可能被用于欺诈或恶意目的。"对此,虚拟面孔成为潜在解决方案。越来越多研究者认为,利用深度伪造技术可生成足够数量和类型的虚拟身份来训练模型。伊迪亚普研究所的科尔舒诺夫表示:"只要算法不意外生成真实人脸,就不存在隐私问题。"

创建合成数据集需分两步:首先生成独特虚拟面孔,然后在不同角度、光线或配饰条件下生成变体。虽然生成器仍需数千张真实图像训练,但远少于直接训练识别模型所需的百万量级。

当前挑战在于使合成数据训练的模型实现普适高精度。arXiv.org七月的研究表明,使用人口平衡的合成数据集训练的模型,在降低种族偏见方面优于同等规模真实数据集。研究中,科尔舒诺夫团队用两种文本-图像模型各生成约万张人口平衡的虚拟面孔,并与WebFace真实数据集对比测试。

结果显示:WebFace训练模型平均准确率85%,但存在偏见——对白人面孔准确率达90%,对非洲面孔仅81%。这种差异源于数据集中白人面孔占比过高。而合成面孔训练的模型虽平均准确率降至75%,但在四大人种间的性能波动仅有WebFace模型的三分之一,意味着其跨种族一致性显著提升。

准确率下降是目前合成数据训练的最大障碍,主要原因在于:生成器能创造的身份数量有限,且多数生成图像过于理想化,缺乏现实世界的复杂多样性(如阴影遮挡等)。为提高精度,研究者计划探索混合方案:先用合成数据教导模型学习不同人群的面部特征规律,再通过合法获取的真实数据微调模型。

该领域发展迅猛——首篇关于使用合成数据训练人脸模型的论文直至2023年才出现。鉴于图像生成技术的飞速进步,科尔舒诺夫对合成数据的潜力充满期待。

但人脸识别的高精度犹如双刃剑:算法不精准会造成损害,过于精准则可能导致人类过度依赖系统。民权组织警告,过度精准的面部识别可能让我们陷入无休止的时空追踪。学术研究者承认这种微妙平衡,但持不同观点:"如果使用精度低的系统,很可能追踪错误对象,"科特瓦尔指出,"因此若必须部署系统,就应该选择正确且高度精准的。"

英文来源:

Can fake faces make AI training more ethical?
Synthetic images may offer hope for training private, fair face recognition
By Celina Zhao
AI has long been guilty of systematic errors that discriminate against certain demographic groups. Facial recognition was once one of the worst offenders.
For white men, it was extremely accurate. For others, the error rates could be 100 times as high. That bias has real consequences — ranging from being locked out of a cell phone to wrongful arrests based on faulty facial recognition matches.
Within the past few years, that accuracy gap has dramatically narrowed. “In close range, facial recognition systems are almost quite perfect,” says Xiaoming Liu, a computer scientist at Michigan State University in East Lansing. The best algorithms now can reach nearly 99.9 percent accuracy across skin tones, ages and genders.
But high accuracy has a steep cost: individual privacy. Corporations and research institutions have swept up the faces of millions of people from the internet to train facial recognition models, often without their consent. Not only are the data stolen, but this practice also potentially opens doors for identity theft or oversteps in surveillance.
To solve the privacy issues, a surprising proposal is gaining momentum: using synthetic faces to train the algorithms.
These computer-generated images look real but do not belong to any actual people. The approach is in its early stages; models trained on these “deepfakes” are still less accurate than those trained on real-world faces. But some researchers are optimistic that as generative AI tools improve, synthetic data will protect personal data while maintaining fairness and accuracy across all groups.
“Every person, irrespective of their skin color or their gender or their age, should have an equal chance of being correctly recognized,” says Ketan Kotwal, a computer scientist at the Idiap Research Institute in Martigny, Switzerland.
How artificial intelligence identifies faces
Advanced facial recognition first became possible in the 2010s, thanks to a new type of deep learning architecture called a convolutional neural network. CNNs process images through many sequential layers of mathematical operations. Early layers respond to simple patterns such as edges and curves. Later layers combine those outputs into more complex features, such as the shapes of eyes, noses and mouths.
In modern face recognition systems, a face is first detected in an image, then rotated, centered and resized to a standard position. The CNN then glides over the face, picks out its distinctive patterns and condenses them into a vector — a list-like collection of numbers — called a template. This template can contain hundreds of numbers and “is basically your Social Security number,” Liu says.
To do all of this, the CNN is first trained on millions of photos showing the same individuals under varying conditions — different lighting, angles, distance or accessories — and labeled with their identity. Because the CNN is told exactly who appears in each photo, it learns to position templates of the same person close together in its mathematical “space” and push those of different people farther apart.
This representation forms the basis for the two main types of facial recognition algorithms. There’s “one-to-one”: Are you who you say you are? The system checks your face against a stored photo, like when unlocking a smartphone or going through passport control. The other is “one-to-many”: Who are you? The system searches for your face in a large database to find a match.
But it didn’t take researchers long to realize these algorithms don’t work equally well for everyone.
Why fairness in facial recognition has been elusive
A 2018 study was the first to drop the bombshell: In commercial facial classification algorithms, the darker a person’s skin, the more errors arose. Even famous Black women were classified as men, including Michelle Obama by Microsoft and Oprah Winfrey by Amazon.
Facial classification is a little different than facial recognition. Classification means assigning a face to a category, such as male or female, rather than confirming identity. But experts noted that the core challenge in classification and recognition is the same. In both cases, the algorithm must extract and interpret facial features. More frequent failures for certain groups suggest algorithmic bias.
In 2019, the National Institute of Science and Technology offered further confirmation. After evaluating nearly 200 commercial algorithms, NIST found that one-to-one matching algorithms had just a tenth to a hundredth of the accuracy in identifying Asian and Black faces compared with white faces, and several one-to-many algorithms produced more false positives for Black women.
The errors these tests point out can have serious, real-world consequences. There have been at least eight instances of wrongful arrests due to facial recognition. Seven of them were Black men.
Bias in facial recognition models is “inherently a data problem,” says Anubhav Jain, a computer scientist at New York University. Early training datasets often contained far more white men than other demographic groups. As a result, the models became better at distinguishing between white, male faces compared with others.
Today, balancing out the datasets, advances in computing power and smarter loss functions — a training step that helps algorithms learn better — have helped push facial recognition to near perfection. NIST continues to benchmark systems through monthly tests, where hundreds of companies voluntarily submit their algorithms, including ones used in places like airports. Since 2018, error rates have dropped over 90 percent, and nearly all algorithms boast over 99 percent accuracy in controlled settings.
In turn, demographic bias is no longer a fundamental algorithmic issue, Liu says. “When the overall performance gets to 99.9 percent, there’s almost no difference among different groups, because every demographic group can be classified really well.”
While that seems like a good thing, there is a catch.
Could fake faces solve privacy concerns?
After the 2018 study on algorithms mistaking dark-skinned women for men, IBM released a dataset called Diversity in Faces. The dataset was filled with more than 1 million images annotated with people’s race, gender and other attributes. It was an attempt to create the type of large, balanced training dataset that its algorithms were criticized for lacking.
But the images were scraped from the photo-sharing website Flickr without asking the image owners, triggering a huge backlash. And IBM is far from alone. Another big vendor used by law enforcement, Clearview AI, is estimated to have gathered over 60 billion images from places like Instagram and Facebook without consent.
These practices have ignited another set of debates on how to ethically collect data for facial recognition. Biometric databases pose huge privacy risks, Jain says. “These images can be used fraudulently or maliciously,” such as for identity theft or surveillance.
One potential fix? Fake faces. By using the same technology behind deepfakes, a growing number of researchers think they can create the type and quantity of fake identities needed to train models. Assuming the algorithm doesn’t accidentally spit out a real face, “there’s no problem with privacy,” says Pavel Korshunov, a computer scientist also at the Idiap Research Institute.
Creating the synthetic datasets requires two steps. First, generate a unique fake face. Then, make variations of that face under different angles, lighting or with accessories. Though the generators that do this still need to be trained on thousands of real images, they require far fewer than the millions needed to train a recognition model directly.
Now, the challenge is to get models trained with synthetic data to be highly accurate for everyone. A study submitted July 28 to arXiv.org reports that models trained with demographically balanced synthetic datasets were better at reducing bias across racial groups than models trained on real datasets of the same size.
In the study, Korshunov, Kotwal and colleagues used two text-to-image models to each generate about 10,000 synthetic faces with balanced demographic representation. They also randomly selected 10,000 real faces from a dataset called WebFace. Facial recognition models were individually trained on the three sets.
When tested on African, Asian, Caucasian and Indian faces, the WebFace-trained model achieved an average accuracy of 85 percent but showed bias: It was 90 percent accurate for Caucasian faces and only 81 percent for African faces. This disparity probably stems from WebFace’s overrepresentation of Caucasian faces, Korshunov says, a sampling issue that often plagues real-world datasets that aren’t purposefully trying to be balanced.
Though one of the models trained on synthetic faces had a lower average accuracy of 75 percent, it had only a third of the variability of the WebFace model between the four demographic groups. That means that even though overall accuracy dropped, the model’s performance was far more consistent regardless of race.
This drop in accuracy is currently the biggest hurdle for using synthetic data to train facial recognition algorithms. It comes down to two main reasons. The first is a limit in how many unique identities a generator can produce. The second is that most generators tend to generate pretty, studio-like pictures that don’t reflect the messy variety of real-world images, such as faces obscured by shadows.
To push accuracy higher, researchers plan to explore a hybrid approach next: Using synthetic data to teach a model the facial features and variations common to different demographic groups, then fine-tuning that model with real-world data obtained with consent.
The field is advancing quickly — the first proposals to use synthetic data for training facial recognition models emerged only in 2023. Still, given the rapid improvements in image generators since then, Korshunov says he’s eager to see just how far synthetic data can go.
But accuracy in facial recognition can be a double-edged sword. If inaccurate, the algorithm itself causes harm. If accurate, human error can still come from overreliance on the system. And civil rights advocates warn that too-accurate facial recognition technologies could indefinitely track us across time and space.
Academic researchers acknowledge this tricky balance but see the outcome differently. “If you use a less accurate system, you are likely to track the wrong people,” Kotwal says. “So if you want to have a system, let’s have a correct, highly accurate one.”

AI科学News

文章目录


    扫描二维码,在手机上阅读