使用伪造人脸能让AI训练更符合伦理吗？

qimuai 发布于 2025-8-23 13:02 阅读：42 一手编译

使用伪造人脸能让AI训练更符合伦理吗？

内容来源：https://www.sciencenews.org/article/fake-faces-ai-training-ethical

内容总结：

【科技前沿】合成人脸图像或破解AI伦理困境，隐私与公平有望兼得
记者：Celina Zhao

面部识别技术曾因针对特定人群的系统性误判而饱受争议——对白人男性识别精准度极高，但对其他族裔和性别的错误率可能高出百倍，甚至导致手机无法解锁或错误逮捕等严重后果。近年来，该技术准确率显著提升，在最佳算法支持下，不同肤色、年龄和性别的识别准确率已接近99.9%。

然而高精度背后隐藏着高昂的隐私代价：企业与研究机构常未经许可从互联网抓取海量真实人脸数据用于模型训练，不仅侵犯个人隐私，更可能引发身份盗用和监控滥用。为破解这一难题，研究者提出创新方案——用合成人脸图像训练AI模型。

这些由计算机生成的虚拟面容逼真却无对应真人，虽目前识别准确率仍低于真实数据训练的模型，但随着生成式AI技术进步，合成数据有望在保护隐私的同时，确保算法对所有群体的公平性和准确性。

技术原理上，现代面部识别依赖卷积神经网络（CNN），通过多层数学运算提取面部特征并转化为数字模板。训练过程中，模型通过对比数百万张同一人在不同条件下的照片，学习区分不同身份。但早期数据集过度代表白人男性，导致算法对其他群体识别能力薄弱。2018年一项研究显示，深色皮肤人群的误判率极高，甚至知名黑人女性被错误分类为男性。

2019年美国国家标准与技术研究院（NIST）进一步证实，针对亚裔和黑人面孔的识别准确率仅为白人的十分之一至百分之一。截至2023年，通过数据平衡、算力提升及损失函数优化，算法误差率下降超90%，群体间差异已大幅缩小。

但隐私争议未止。IBM等企业因未经授权爬取Flickr照片制作“多样性数据集”遭遇强烈反对，Clearview AI更被指控非法收集600亿张社交媒体图像。合成数据此时成为破局关键：研究者通过文本生成图像模型创建 demographic 平衡的虚拟人脸，再生成不同角度、光照的变体。实验显示，尽管合成数据训练模型的平均准确率暂低于真实数据（75% vs 85%），但其在不同族裔间的性能波动幅度减少三分之二，公平性显著提升。

当前合成数据应用仍面临两大挑战：一是生成器能创造的身份数量有限，二是生成图像多趋于“完美”，缺乏现实场景的复杂性。下一步，研究者计划探索混合训练模式——先用合成数据学习跨群体面部特征，再经授权使用少量真实数据微调模型。

尽管技术进展迅速（首篇合成数据训练面部识别模型的论文于2023年才发表），学术界仍警惕高精度技术的双刃剑效应：过度精准可能助长全域监控，而误差则会导致误判。正如研究者所言：“若必须使用系统，就应确保其正确且高度准确”——在伦理与技术的平衡木上，合成数据或许正开辟一条新的路径。

中文翻译：

用AI生成的假面孔训练人脸识别，会更合乎伦理吗？
合成图像有望帮助我们在保护隐私的前提下，更公平地训练人脸识别模型。
作者：Celina Zhao
人工智能长期以来一直存在系统性的缺陷，会对特定人群产生歧视。其中，人脸识别曾是重灾区。
对白人男性，识别准确率极高；但对其他人群，错误率可能高出100倍。这种偏差会带来切实的后果——轻则无法解锁手机，重则因错误匹配而遭误捕。
过去几年间，这种准确率差距已急剧缩小。"在近距离范围内，人脸识别系统几乎完美无缺，"密歇根州立大学的计算机科学家刘小明（音译）表示。如今最佳算法在不同肤色、年龄和性别群体中均可达到近99.9%的准确率。
但高准确率伴随着巨大代价：个人隐私。企业与研究机构未经许可就从互联网上抓取了数百万张人脸图像用于训练模型，这不仅构成数据窃取，更为身份盗用和监控越权打开了大门。
为解决隐私问题，一个出乎意料的方案正获得关注：使用合成人脸训练算法。
这些计算机生成的图像逼真度高，却不对应任何真实个体。该方法尚处早期阶段；用"深度伪造"图像训练的模型准确率仍低于真实人脸训练的模型。但研究者乐观认为，随着生成式AI工具进步，合成数据既能保护个人信息，又能保持对各群体的准确识别与公平性。
"每个人，无论肤色、性别或年龄，都应享有同等的被正确识别的机会，"瑞士伊迪普研究所的计算机科学家科特瓦尔表示。
人工智能如何识别人脸
2010年代，得益于卷积神经网络（CNN）这一新型深度学习架构，高级人脸识别首度成为可能。CNN通过多层数序运算处理图像：底层识别边缘和曲线等简单模式，高层则将输出组合为眼、鼻、口等复杂特征。
在现代系统中，人脸先被检测定位，再经旋转、居中及缩放至标准位置。随后CNN扫描面部，提取特征模式并压缩为向量（一种数字集合，称为模板）。刘小明指出："该模板可包含数百个数字， essentially（基本上）就是你的社会保障号码。"
为达成这一过程，CNN需预先在数百万张照片上训练，这些照片需包含同一人在不同光线、角度、距离或配饰下的影像，并标注身份信息。通过精确标注，CNN学会将同一人的模板在数学"空间"中聚拢，不同人的模板则推远。
这一表征奠定了两类主流人脸识别算法的基础："一对一"验证（如手机解锁或通关核验）和"一对多"检索（如数据库匹配身份）。但研究者很快发现，这些算法并非对所有人都同样有效。
为何人脸识别难以实现公平
2018年一项研究首度揭露：商业人脸分类算法中，肤色越深错误率越高。微软系统曾将米歇尔·奥巴马误判为男性，亚马逊系统亦误判奥普拉·温弗瑞。
虽然分类（划分性别等类别）与识别（确认身份）存在差异，但核心挑战一致：算法均需提取并解读面部特征。某些群体更高的失误率暗示了算法偏见。
2019年美国国家标准技术研究院进一步证实：评估近200种商业算法后，发现一对一匹配算法对亚裔和黑人面孔的准确率仅为白人的十分之一至百分之一；部分一对多算法对黑人女性误报率更高。
这些错误导致严重后果：至少已有八起误捕案件与人脸识别相关，其中七起涉及黑人男性。
纽约大学的计算机科学家贾因指出，模型偏差"本质是数据问题"。早期训练数据集中白人男性占比远高于其他群体，导致模型更擅长区分白人男性面孔。
如今，通过平衡数据集、提升算力及优化损失函数，人脸识别已趋近完美。2018年以来，错误率下降超90%，几乎所有算法在受控环境下均达到99%以上准确率。刘小明表示："当整体性能达99.9%时，不同群体间几乎无差异。"
但这背后存在隐忧。
假面孔能否解决隐私问题？
2018年研究曝光算法偏差后，IBM发布了"面部多样性"数据集，包含超百万张标注种族、性别等属性的图像，试图弥补算法缺乏平衡数据的问题。但这些图片未经济许可就从Flickr抓取，引发强烈反弹。
类似情况屡见不鲜：执法机构使用的Clearview AI被指未经同意从社交平台采集超600亿张图像。贾因警告，生物特征数据库存在重大隐私风险，"可能被用于欺诈或恶意监控"。
假面孔或成解决方案？越来越多研究者尝试用深度伪造技术生成大量虚拟身份训练模型。伊迪普研究所的科尔舒诺夫表示："只要算法不意外生成真实人脸，就不存在隐私问题。"
创建合成数据集需两步：首先生成独特虚拟面孔，随后制作该面孔在不同角度、光线或配饰下的变体。虽然生成器仍需数千张真实图像训练，但远少于直接训练识别模型所需的百万量级。
当前挑战在于提升合成数据训练模型的普适准确性。arXiv.org7月28日研究显示，使用人口平衡的合成数据集训练的模型，比同等规模真实数据集训练的模型更能减少种族间偏差。
研究中，合成人脸训练的模型平均准确率虽降至75%，但其在四大种族群体间的差异仅为真实数据模型的三分之一——意味着性能表现更均衡。
准确率下降仍是合成数据训练算法的最大障碍，主因在于生成器能创造的身份数量有限，且倾向于生成影棚式精美图像，缺乏现实世界的杂乱多样性（如阴影遮挡）。
下一步，研究者计划探索混合方案：先用合成数据教授模型不同人群的面部特征与变体，再用经许可获取的真实数据微调模型。尽管该领域刚起步（首篇合成数据训练人脸识别模型的论文发表于2023年），但科尔舒诺夫对发展前景充满期待。
需要警惕的是，人脸识别准确性是把双刃剑：不准确会导致算法本身造成伤害，过于准确则可能因人类过度依赖系统而产生误判。民权组织警告，过度精准的技术可能让我们在时空中被无限追踪。
学界研究者承认这种微妙平衡，但持不同观点："如果系统不准确，很可能追踪错对象，"科特瓦尔表示，"因此若必须使用系统，那就应该追求正确且高度精准的。"

英文来源：

Can fake faces make AI training more ethical?
Synthetic images may offer hope for training private, fair face recognition
By Celina Zhao
AI has long been guilty of systematic errors that discriminate against certain demographic groups. Facial recognition was once one of the worst offenders.
For white men, it was extremely accurate. For others, the error rates could be 100 times as high. That bias has real consequences — ranging from being locked out of a cell phone to wrongful arrests based on faulty facial recognition matches.
Within the past few years, that accuracy gap has dramatically narrowed. “In close range, facial recognition systems are almost quite perfect,” says Xiaoming Liu, a computer scientist at Michigan State University in East Lansing. The best algorithms now can reach nearly 99.9 percent accuracy across skin tones, ages and genders.
But high accuracy has a steep cost: individual privacy. Corporations and research institutions have swept up the faces of millions of people from the internet to train facial recognition models, often without their consent. Not only are the data stolen, but this practice also potentially opens doors for identity theft or oversteps in surveillance.
To solve the privacy issues, a surprising proposal is gaining momentum: using synthetic faces to train the algorithms.
These computer-generated images look real but do not belong to any actual people. The approach is in its early stages; models trained on these “deepfakes” are still less accurate than those trained on real-world faces. But some researchers are optimistic that as generative AI tools improve, synthetic data will protect personal data while maintaining fairness and accuracy across all groups.
“Every person, irrespective of their skin color or their gender or their age, should have an equal chance of being correctly recognized,” says Ketan Kotwal, a computer scientist at the Idiap Research Institute in Martigny, Switzerland.
How artificial intelligence identifies faces
Advanced facial recognition first became possible in the 2010s, thanks to a new type of deep learning architecture called a convolutional neural network. CNNs process images through many sequential layers of mathematical operations. Early layers respond to simple patterns such as edges and curves. Later layers combine those outputs into more complex features, such as the shapes of eyes, noses and mouths.
In modern face recognition systems, a face is first detected in an image, then rotated, centered and resized to a standard position. The CNN then glides over the face, picks out its distinctive patterns and condenses them into a vector — a list-like collection of numbers — called a template. This template can contain hundreds of numbers and “is basically your Social Security number,” Liu says.
To do all of this, the CNN is first trained on millions of photos showing the same individuals under varying conditions — different lighting, angles, distance or accessories — and labeled with their identity. Because the CNN is told exactly who appears in each photo, it learns to position templates of the same person close together in its mathematical “space” and push those of different people farther apart.
This representation forms the basis for the two main types of facial recognition algorithms. There’s “one-to-one”: Are you who you say you are? The system checks your face against a stored photo, like when unlocking a smartphone or going through passport control. The other is “one-to-many”: Who are you? The system searches for your face in a large database to find a match.
But it didn’t take researchers long to realize these algorithms don’t work equally well for everyone.
Why fairness in facial recognition has been elusive
A 2018 study was the first to drop the bombshell: In commercial facial classification algorithms, the darker a person’s skin, the more errors arose. Even famous Black women were classified as men, including Michelle Obama by Microsoft and Oprah Winfrey by Amazon.
Facial classification is a little different than facial recognition. Classification means assigning a face to a category, such as male or female, rather than confirming identity. But experts noted that the core challenge in classification and recognition is the same. In both cases, the algorithm must extract and interpret facial features. More frequent failures for certain groups suggest algorithmic bias.
In 2019, the National Institute of Science and Technology offered further confirmation. After evaluating nearly 200 commercial algorithms, NIST found that one-to-one matching algorithms had just a tenth to a hundredth of the accuracy in identifying Asian and Black faces compared with white faces, and several one-to-many algorithms produced more false positives for Black women.
The errors these tests point out can have serious, real-world consequences. There have been at least eight instances of wrongful arrests due to facial recognition. Seven of them were Black men.
Bias in facial recognition models is “inherently a data problem,” says Anubhav Jain, a computer scientist at New York University. Early training datasets often contained far more white men than other demographic groups. As a result, the models became better at distinguishing between white, male faces compared with others.
Today, balancing out the datasets, advances in computing power and smarter loss functions — a training step that helps algorithms learn better — have helped push facial recognition to near perfection. NIST continues to benchmark systems through monthly tests, where hundreds of companies voluntarily submit their algorithms, including ones used in places like airports. Since 2018, error rates have dropped over 90 percent, and nearly all algorithms boast over 99 percent accuracy in controlled settings.
In turn, demographic bias is no longer a fundamental algorithmic issue, Liu says. “When the overall performance gets to 99.9 percent, there’s almost no difference among different groups, because every demographic group can be classified really well.”
While that seems like a good thing, there is a catch.
Could fake faces solve privacy concerns?
After the 2018 study on algorithms mistaking dark-skinned women for men, IBM released a dataset called Diversity in Faces. The dataset was filled with more than 1 million images annotated with people’s race, gender and other attributes. It was an attempt to create the type of large, balanced training dataset that its algorithms were criticized for lacking.
But the images were scraped from the photo-sharing website Flickr without asking the image owners, triggering a huge backlash. And IBM is far from alone. Another big vendor used by law enforcement, Clearview AI, is estimated to have gathered over 60 billion images from places like Instagram and Facebook without consent.
These practices have ignited another set of debates on how to ethically collect data for facial recognition. Biometric databases pose huge privacy risks, Jain says. “These images can be used fraudulently or maliciously,” such as for identity theft or surveillance.
One potential fix? Fake faces. By using the same technology behind deepfakes, a growing number of researchers think they can create the type and quantity of fake identities needed to train models. Assuming the algorithm doesn’t accidentally spit out a real face, “there’s no problem with privacy,” says Pavel Korshunov, a computer scientist also at the Idiap Research Institute.
Creating the synthetic datasets requires two steps. First, generate a unique fake face. Then, make variations of that face under different angles, lighting or with accessories. Though the generators that do this still need to be trained on thousands of real images, they require far fewer than the millions needed to train a recognition model directly.
Now, the challenge is to get models trained with synthetic data to be highly accurate for everyone. A study submitted July 28 to arXiv.org reports that models trained with demographically balanced synthetic datasets were better at reducing bias across racial groups than models trained on real datasets of the same size.
In the study, Korshunov, Kotwal and colleagues used two text-to-image models to each generate about 10,000 synthetic faces with balanced demographic representation. They also randomly selected 10,000 real faces from a dataset called WebFace. Facial recognition models were individually trained on the three sets.
When tested on African, Asian, Caucasian and Indian faces, the WebFace-trained model achieved an average accuracy of 85 percent but showed bias: It was 90 percent accurate for Caucasian faces and only 81 percent for African faces. This disparity probably stems from WebFace’s overrepresentation of Caucasian faces, Korshunov says, a sampling issue that often plagues real-world datasets that aren’t purposefully trying to be balanced.
Though one of the models trained on synthetic faces had a lower average accuracy of 75 percent, it had only a third of the variability of the WebFace model between the four demographic groups. That means that even though overall accuracy dropped, the model’s performance was far more consistent regardless of race.
This drop in accuracy is currently the biggest hurdle for using synthetic data to train facial recognition algorithms. It comes down to two main reasons. The first is a limit in how many unique identities a generator can produce. The second is that most generators tend to generate pretty, studio-like pictures that don’t reflect the messy variety of real-world images, such as faces obscured by shadows.
To push accuracy higher, researchers plan to explore a hybrid approach next: Using synthetic data to teach a model the facial features and variations common to different demographic groups, then fine-tuning that model with real-world data obtained with consent.
The field is advancing quickly — the first proposals to use synthetic data for training facial recognition models emerged only in 2023. Still, given the rapid improvements in image generators since then, Korshunov says he’s eager to see just how far synthetic data can go.
But accuracy in facial recognition can be a double-edged sword. If inaccurate, the algorithm itself causes harm. If accurate, human error can still come from overreliance on the system. And civil rights advocates warn that too-accurate facial recognition technologies could indefinitely track us across time and space.
Academic researchers acknowledge this tricky balance but see the outcome differently. “If you use a less accurate system, you are likely to track the wrong people,” Kotwal says. “So if you want to have a system, let’s have a correct, highly accurate one.”

AI科学News

文章目录

📚 推荐阅读

扫描二维码，在手机上阅读