一家非营利组织如何借助Cloudera与人工智能实现数据转型

qimuai 发布于 2026-3-19 11:01 阅读：3 一手编译

内容来源：https://aibusiness.com/data-management/how-a-nonprofit-transforms-data-with-cloudera-and-ai

内容总结：

非营利组织借助AI技术破解罕见病治疗研究难题，大幅降低科研成本

在近日举行的Gartner数据与分析峰会上，一家专注于罕见病治疗研究的非营利组织“罕见希望”分享了其利用人工智能技术突破资源限制、加速科研进程的实践。该组织联合创始人布莱恩·马丁指出，对于缺乏大型药企巨额资金支持的非营利机构而言，开展大规模疾病研究通常“是一个不切实际的命题”。他以知名非营利组织Every Cure为例，说明类似使命所需的资金规模往往高达数千万美元。

为解决这一困境，“罕见希望”选择了与混合数据与AI平台提供商Cloudera合作。借助该平台的数据处理与分析能力，组织能够从科研论文、医学影像等多种非结构化数据源中自动提取信息，并将其转化为结构化数据。通过平台内置的PySpark等工具，研究团队能高效识别疾病与药物之间的关联与模式，这一过程若依靠传统人工方法可能需要数年时间。

马丁表示，该技术使组织能够“将这类内容交到患者和医生手中，而这在没有数千万美元资金的情况下根本不可能实现”。他特别强调，Cloudera平台不绑定特定AI模型，而是通过集成英伟达NIM微服务，支持用户根据具体应用场景灵活选择大语言模型。这种开放性既避免了自建基础设施的高昂成本，也保障了技术路线的适应性。

目前，“罕见希望”已能通过自动化流程快速生成治疗假设并向公众发布研究成果。团队正进一步探索如何动态监测新研究论文发布后的数据变化，通过增量式分析取代全流程重跑，从而“节省大量时间”。这一案例表明，合理运用生成式AI与数据平台，能为资源有限的科研机构开辟新的可能性。

中文翻译：

由谷歌云赞助
如何选择首个生成式AI应用场景
要开始应用生成式AI，首先应关注能够提升人类信息体验的领域。

该机构开发了从各类科学资料中提取并结构化信息的数据管道，显著加速了研究进程。

当布莱恩·马丁联合创立非营利组织"罕见希望"时，该机构致力于向公众提供罕见疾病治疗假说，却面临缺乏大型药企数百万美元资金和资源的困境。

"对任何非营利组织而言，开展这类工作通常是不切实际的，"马丁上周在奥兰多举行的Gartner数据与分析峰会上接受采访时表示。他指出，知名非营利组织"Every Cure"（致力于利用FDA批准药物治疗罕见疾病）已筹集约7600万美元资金，这凸显了同类使命组织所需的巨额资本。

然而，由于马丁此前已接触过混合数据与AI供应商Cloudera，他认为这家供应商或许能帮助"罕见希望"以较低成本实现使命——大型药企向公众发布罕见疾病假说通常需要高昂支出。马丁未透露"罕见希望"使用Cloudera平台的具体费用。

"这是一个难得的机遇，让我们能将这类内容传递给患者和医生。若没有数千万美元资金，这根本不可能实现。"马丁说道。

Cloudera帮助总部位于华盛顿特区的"罕见希望"实现使命的关键方式之一，是该组织利用其数据与AI平台从多类型数据中获取洞察。

借助该平台，"罕见希望"得以从研究论文、医学影像等资料中提取知识，发现原本需要数年才能识别的关联与模式。

通过Cloudera，"罕见希望"建立了处理科学论文等非结构化数据的数据管道，并将其转化为结构化数据。利用Cloudera的PySpark工具（用于构建数据工程和机器学习管道），该组织能够从科学数据中提取知识，将信息从非结构化转为结构化，随后在Cloudera外部工具平台中使用转化后的数据进行分析，寻找疾病与药物等概念间的关联。"罕见希望"将假说重新导入Cloudera平台，持续开展深入研究。在此过程中，该组织运用大语言模型生成分析报告或假说，并向公众发布。

"从数据、信息、知识、洞察到智慧与影响力的链条，是相当成熟的体系架构，"马丁解释道，"我们利用Cloudera自动化基础环节，打通人类认知轴心与智慧链接，最终实现影响力转化。"

在生成式AI模型选择方面，"罕见希望"并未绑定特定模型。

Cloudera方面也不要求客户使用特定模型。不过该供应商已将英伟达NIM微服务集成至基础设施中，使其能够部署管理大语言模型。英伟达NIM微服务是一套预构建的容器化工具包，包含AI模型、推理引擎、标准API及企业部署AI模型所需的其他工具。

"Cloudera不制造也不销售模型，"Cloudera产品营销与推广副总裁大卫·迪希曼表示，"客户可以自由选择最适合的模型。针对不同场景使用相应模型，切忌试图用单一模型解决所有问题。"

"罕见希望"也认识到，不同模型在不同任务和应用中表现各异，因此获取多种模型至关重要。马丁指出，Cloudera的模型选择功能为该组织带来了额外优势——无需自建模型基础设施，即可直接获取模型、输入数据，并将结果导回Cloudera平台。

"英伟达NIM基础设施使我们能够原生运行部分任务。"马丁补充道。

尽管Cloudera已通过帮助发布研究和白皮书成果，助力"罕见希望"向公众提供各类疾病假说，从而节省大量时间，该组织目前正探索如何监测新研究论文发布时的数据变化。

"我们如何处理数据管道中的各类变更事件，以了解不同的下游影响？"马丁提出，"这类机制能极大节省时间——无需在每次出现新数据时重新运行整个流程，通过增量处理即可分析变化与差异。"

英文来源：

Sponsored by Google Cloud
Choosing Your First Generative AI Use Cases
To get started with generative AI, first focus on areas that can improve human experiences with information.
The organization developed data pipelines that extract and structure information from various scientific sources, significantly accelerating the research process.
When Brian Martin co-founded Rare Hope NFP, a nonprofit focused on giving the public access to hypotheses for rare disease treatment, the organization needed a way to fulfill its purpose despite lacking the millions of dollars and resources of big pharmaceutical companies.
"For any nonprofit to be able to do this type of thing is generally an unreasonable proposition," Martin said in an interview at the Gartner Data & Analytics Summit in Orlando last week. He noted that the well-known nonprofit Every Cure, which seeks to use FDA-approved medicines to treat rare diseases, has raised about $76 million in funding, underscoring the significant capital needed for organizations with a similar mission.
However, with Martin already having experience with the hybrid data and AI vendor Cloudera, he felt the vendor might be able to help Rare Hope execute on its mission without the high costs that big pharmaceutical companies incur when releasing such hypotheses on rare diseases to the public. Martin did not disclose the amount Rare Hopes spends on using the Cloudera platform.
"It's an opportunity to do something and to put that type of content in patients' and doctors' hands that we couldn't ever do without millions and millions of dollars," Martin said.
One way Cloudera was instrumental in helping Washington, D.C.-based Rare Hope fulfill its mission is that the nonprofit used the data and AI platform to gain insight from diverse types of data.
With the platform, Rare Hope was able to extract knowledge from research papers, medical images, and other documentation, identifying correlations and patterns that would have taken years to discover, Martin said.
Using Cloudera, Rare Hope created data pipelines that processed unstructured data, such as scientific papers, and transformed it into structured data. Using a tool in Cloudera called PySpark (for building data engineering and machine learning pipelines), Rare Hopes can extract knowledge from scientific data, transform that information from unstructured to structured, and then use the transformed data in tools and platforms outside Cloudera or run analysis and find correlations between concepts such as a disease and a drug. Rare Hopes brings the hypothesis back into the Cloudera platform and continues to conduct further studies. In that case, Rare Hopes uses a large language model (LLM) to generate an analysis or hypothesis that the organization will present to the public.
"That data information knowledge, insight, wisdom and impact chain, that's a pretty well-established hierarchy," Martin said. "We use Cloudera to automate that base part, that human axis, that wisdom link, to deliver the impact."
As for generative AI models, Rare Hopes is not committed to any specific model.
For its part, Cloudera does not require its customers to use a specific model. However, the vendor has integrated Nvidia NIM microservices into its infrastructure, enabling it to deploy and manage LLMs. Nvidia NIM microservices is a suite of prebuilt, packaged containers that include an AI model, inference engines, standard APIs, and other tools enterprises need to deploy AI models.
"Cloudera doesn't make a model and sell it to you," said David Dichmann, vice president of product marketing and evangelism at Cloudera. "Choose your model, choose your model well, and we recognize you want freedom of choice. Use the right model for the right use case. Do not try to fit everything into one kind of model."
Rare Hope also recognizes that because different models work better for different tasks and applications, it is important to have access to a range of models. Model choice in Cloudera is an added benefit to the nonprofit, Martin said. The nonprofit does not have to build the infrastructure to access the models, provide them with data, and then bring the results back into the Cloudera platform.
"The Nvidia NIM infrastructure gives us the ability to run some of that stuff directly natively," Martin said.
While Cloudera already helps Rare Hope save a significant amount of time by helping deliver different hypotheses on various diseases to the public by publishing its research and white paper findings, the nonprofit is now looking at how to monitor changes to the data when a new research paper is published.
"How do we handle different change events within those pipelines to know what the different downstream effects are?" Martin said. "Those types of things save an immense amount of time because instead of rerunning the entire process over again every time there's new data, we can run incremental processes to analyze the changes and the differences."

商业视角看AI

文章目录

📚 推荐阅读

扫描二维码，在手机上阅读