人工智能数据采集和管理

推特
鸣叫
LinkedIn
分享
Facebook
fb-share-icon

部署成功AI的最佳实践

If you aren’t working with artificial intelligence, you will be soon. We interact with AI nearly every day, forcing many companies to experiment in the space. No matter where you’re at, you will likely encountered some challenges along the way. Two of the more complex elements for successfully implementing AI within your business are data acquisition and governance.

有几种最佳实践可以帮助您指导如何最好地构建和部署有效的AI解决方案。bob平台app下载为长期成功做好准备,最终将要求您建立全面的AI治理框架(尤其是在数据治理方面)和可扩展的数据管道。

我们将分解AI治理的关键注意事项,并逐步指南bob体育手机下载管道创建和维护。

定义AI治理

AI治理是监督组织的AI使用和实施的框架。每个组织如何定义该框架都受其行业,内部公司规则,法规以及当地法律的影响。无论如何,都没有一种适合的方法。每个组织都应选择最适合其需求的东西。但是,通常,AI治理的三个关键领域经常出现在框架中:

Performance

How you measure your model’s performance is an important factor in development. Your team should develop a series of metrics that you’ll track from initial model build andpost-deployment确保模型执行(继续执行) as expected. There are a couple of critical factors to incorporate into your metrics:

准确性

On the one hand, when it comes to accuracy you want to consider the precision and recall of your model. Is it meeting your desired confidence thresholds when making predictions? If not, you’ll need to iterate. On the other hand, you’ll want to consider whether your model has all of the context it requires to make accurate predictions. Your data will give you the answer here, but ensure it includes all of your use cases and known edge cases.

Bias/Fairness

在corporate metrics that measure bias in your model’s performance. There are third-party tools available that can help track this. Bias can come from sampling—i.e., how you collected the data, from where, and by whom—and also from who you have annotating your data.

例如,已证明最高面部识别软件的肤色比较轻的皮肤具有更高的错误率。例如,黑人妇女的错误率超过25%,而白人男性仅为1%。这是收集到的数据(有色人种的代表人数不足)的问题,并标记了数据(主要是白人),因为他们缺乏多样性在最终解决方案中反映了很差。

There are best practices you can implement in your AI data acquisition and governance frameworks to减少AI的偏见.

Transparency

您的组织可能需要立法,要求您展示您的AI模型如何做出决定。一般数据保护法规或GDPR是欧洲这样的例子,它使消费者拥有透明度的权利。即使您不接受监管,AI模型的解释性对于最终用户和可重复性仍然至关重要。当您构建模型时,请彻底记录其工作原理。您的治理框架可以解决您的文档实践和对透明度的承诺水平。

伦理

伦理is the third area that’s very common to find in an AI governance framework. Ethics play a role throughout AI implementations, starting with ensuring the intent of the solution is ethical and ending with whether the model continues to perform as intended. In this section, you’ll want to define whatresponsible AI looks like from pilot to productionto your organization and what kind of processes you’ll have in place to ensure those requirements are met.

Data Governance: Areas to Address

数据治理data data acquisition and data pipelines

数据治理是指您的组织如何管理系统中的数据。这是组织整体AI治理框架的关键组成部分。在数据治理,您可能需要包括以下组件:

可用性

您的数据是可以访问和需要的人。本节应回答组织中谁可以看到什么问题。

Usability

Your data is structured, labeled, and easy to use. Data scientists spend large amounts of time wrangling data to make it usable. To reduce this time, have data pipelines and processes in place that make data preparation faster, easier, and more scalable.

在tegrity

Your data maintains its structure, qualities, and completeness across its lifecycle. Your data pipeline should center on ensuring the data you use is consistent throughout your model build process.

Security

Your data is protected from corruption, unauthorized use, or modification across its lifecycle. The data used for AI can often include personal information. Have security checks in place that are appropriate for the type of data you’re using, especially if that information is sensitive.

Learn more aboutAI和数据保护法规和认证that you should be aware of or think about whenoutsourcing data collection and annotation.

Training Data Pipeline and Maintenance

As we refer repeatedly to data pipelines, it’s helpful to know best practices for building and maintaining these processes. Let’s walk through a full data pipeline from start to finish:

1. Data Acquisition

You’ll收集数据from one or a variety of sources. These may include internal sourcing, readily-available data, open-source datasets, or third-party vendors. The goal is to source data that covers all possible use cases and edge cases for your end-users. Be sure you’re sourcing your data ethically.

2.数据注释

在数据管道的下一步中,您将执行data annotation(e.g., image classification, audio transcription, or other types). Who you select to label your data is very important; these people need to have diverse backgrounds and perspectives so as to reduce the potential for bias. For large annotation jobs, companies often rely on third-party crowd workers sourced from around the globe.

3.数据审核

虽然您应该在此过程中的每个阶段审核数据,但注释后,确保数据标签准确且无偏见尤为重要。注释应解释所有用例。完成数据审核后发现您的labeled data符合您的准确性标准,您准备培训模型并部署它。

4.模型更新

很少有用例依赖静态模型。在大多数情况下,您需要经常更新模型,以反映现实世界和不断变化的数据。部署后,您的数据管道应继续为您服务,因为您继续创建新的培训数据以避免模型漂移或停滞。bob体育手机下载模型维护的这一组成部分经常被低估,但对于在AI中取得长期成功至关重要。

我们分解了什么comprehensive data pipeline for autonomous vehiclesmight look like, as an example.

在Summary: AI Best Practices

如果有什么明确的话,就是AI数据获取和治理框架对于制定组织的AI策略至关重要。除了这些要素之外,您的团队在整个模型构建过程中都需要回答更多的问题。在高水平上,这些问题通常涉及以下领域:

  • 知道问题。您可以通过AI解决您的问题吗?
  • 了解数据。Do you have all the data you need to train an AI algorithm?
  • 确定关键指标。哪些指标围绕准确性,效率,节省成本,偏见等。表示您的模型成功?
  • 审核性能。您是否有识别模型漂移的方法?
  • Iterate.Are you consistently retraining and tuning your model, even after deployment?

有了正确的tools and processes in place, you’ll be better set up for success. Learning from the achievements of others in this space is likewise an essential step toward developing AI pipelines and frameworks that’ll equip your organization to deploy AI with confidence and at scale.

If your team needs help along the way, consider working with us at Appen. We have the experience, expertise, services, and solutions to help you along the way. Learn more about ourbob平台app下载并辅助数据注释平台, or联系我们.

Website for deploying AI with world class training data