Intro
You’ve heard about the need for AI governance and understand what it is. Perhaps you have begun thinking about developing a strategy and framework to ensure that your organization is on track to meet its AI governance goals. A critical part of your strategy will require a deep understanding of the tools in your ecosystem and how they contribute to the overall Auditability of your AI.
Perhaps you have an AI governance expert in-house, are working with a consultant, or are leveraging online resources. Regardless of the path you choose, you must become familiar with the tools in your ecosystem as they relate to AI Auditability. Leveraging these tools appropriately will help your organization mitigate risks and maximize rewards associated with your AI strategy.
AI Governance is Trending
Even prior to the release of Chat-GPT in late 2022, AI governance was trending upwards in Google. This was due to many large companies, likely operating with industry knowledge of generative AI powerful capabilities, assembling teams and task forces dedicated to AI governance, policy, ethics, and software. Furthermore, around this time, laws such as the EU AI act and frameworks such as the NIST AI Risk Management Framework were released to the public for review and discussion. These developments, in addition to the public release of generative AI algorithms in late 2022, have thrust AI governance into the mainstream. AI Auditability is one of the first components an organization must consider when thinking about their AI governance strategy. (See fig 1.0)
Key Features of AI Auditability
Large companies, like IBM, Microsoft, Google, and others have begun rolling out AI governance tools alongside open-source guides and software. Smaller technology companies, like Credo and Fairo, are also offering more tailored SaaS based solutions to help organizations govern their AI. Independent of the solution your organization chooses, your AI consumption, development, and implementation processes must be auditable. According to IBM, “Many kinds of skillsets are needed in the AI lifecycle, including product owners, model developers, model validators, and model deployment engineers.” To make AI solutions auditable, organizations will need to work towards some of the following goals:
• Enhanced collaboration between different skillsets through a common taxonomy of terms and “facts” about model development and deployment
• A complete view of models through collection of metadata across the lifecycle and across tools
• A catalog (or registry) of all models and data used to train those models
• Documented workflows and processes for establishing accountability and checks at each point in the lifecycle, such as bringing the data science team closer to the CDO (Chief Data Officer) or cross-functional use-case reviews
• Standards and policies that can be automatically enforced and monitored through technology
• Guidance when extending a governance program to include both data and AI (either internally developed or externally procured).
Unlike end-to-end AI delivery platforms that aim to implement and control the entire AI project lifecycle, pure “AI governance tools” tend to focus more on platform agnostic AI governance and Auditability. Such tools can be useful for large organizations who use many different systems, or for small projects without the budget to purchase an end-to-end system (i.e., Databricks, Data Robot). These systems can be used to help ensure AI complies with a specific industry’s regulations and security standards, as well as managing model documentation.
Generally speaking, there is no one-size-fits all software solution to make your AI auditable. Implementing your AI governance strategy and framework will require that you have multiple software tools working together. Many of these tools your organization may already use, others you may need to purchase or develop internally. In the upcoming sections we will highlight various software types and tools that can contribute to overall AI Auditability thereby helping your organization implement an AI governance framework and strategy successfully.
Data Lineage and ETL Tools
Data lineage is the process of tracking the flow of data over time, clearly documenting where the data originated, how it has evolved, and its ultimate destination within the data pipeline. These tools provide a record of data throughout its entire lifecycle, including source information and any transformations that have been applied during any ETL or ELT processes. This type of documentation is crucial as it enables users to trace different touchpoints along the data journey, allowing organizations to vouch for accuracy and consistency while also allowing them to trace an error back to its roots. The ability to ensure data quality within an organization is critical.
The data journey should specify: The data provenance (3rd Party Apps, APIs, CSV files, databases, data warehouses, etc.), data point transformations (allowing you to track back to origins. Datastores, where data is either temporarily or permanently loaded: files, databases, data warehouses, and data lakes. Integrations with other tools, such as business intelligence software or CRMs.
DataRobot: Provides automated machine learning along with model explainability features, ETL, and data lineage. DataRobot claims that 40% of the Fortune 50 companies employ its cloud tools. As well as 8 of the top 10 US banks, 5 of the top 10 global banks, 7 of the top 10 pharmaceutical companies, 7 of the top 10 telecommunications companies, and 6 of the top 10 global manufacturers.
Alation: A data catalog that helps in data governance, allowing data stewards and experts to annotate and ensure the quality of data used in AI models. Alation mentions customers such as Salesforce, Discover, Crocs, Pfiser, and more.
Keboola: Cloud-based data platform as a service. With Keboola you can automate your entire data pipeline: from collecting structured and unstructured data, to transforming and storing it for analysis. At each step of the pipeline, Keboola automatically tracks all relevant metadata and constructs logs. This gives you a granular view of data lineage so you can identify root cause of errors faster.
Atlan: is a cloud-based data democratization company designed to help businesses manage their entire data ecosystems with tools for data discovery, data cataloging, data governance and embedded collaboration.
Collibra: is data intelligence company with a cloud-based platform that features flexible governance, continuous quality and built-in privacy for all data types. Collibra is best for creating an inventory of the data assets, capturing information and for data governance.
Model Explainability Tools
IBM AI Explainability 360: An open-source library that helps understand data and machine learning models. This extensible toolkit can help you comprehend how machine learning models predict labels by various means throughout the AI application lifecycle.
SHAP (SHapley Additive exPlanations): A popular open-source tool for model explainability. SHAP can help explain the predictions of Machine Learning models in a way that humans can understand. By assigning a value to each input feature, it shows how and to what extent each feature contributed to the final prediction result. It provides clarity on how the model made its decision and can identify the most important features.
Ethical and Bias Detection Tools
Fairness Indicators: An open-source tool by Google’s TensorFlow to detect and remediate unfair biases in machine learning models. Google says Fairness Indicators is a library that enables easy computation of commonly identified fairness metrics for binary and multi-class classifiers. Built to handle large scale datasets and models.
Aequitas: An open-source bias audit toolkit for machine learning models. It works by comparing the false positive and negative rates between the “overall reference” group against the “protected or selected” group. If the disparity for a “protected or selected” group is between 80 and 125 percent of the value of the reference group, the audit passes.
Data Governance Tools
Collibra: Focuses on data governance and offers features like data lineage, quality checks, and auditing. From actual review: Well-structured for data governance role. Great workflow tool for driving toward governance outcomes. Ability to tie policy to data assets. Traceability has proven useful for some users. Ability to capture a conceptual model effectively. Incredibly flexible architecture, but you must know how to deploy it.
BigID: Provides features that help in data discovery and mapping, thereby assisting in GDPR compliance. Their actionable data intelligence platform enables organizations to know their enterprise data and take action. Can automatically discover, catalog, and classify all data types and sources. It also will identify sensitive, personal, regulated, critical, and duplicate data and manage privacy requirements and regulatory compliance.
Model Observability and Management Tools
Databricks: Offers a unified platform for data science and analytics, which includes some governance features. Maintain a compliant, end-to-end view of your data estate with a single model of data governance for all your structured and unstructured data. Centralize auditing and track usage through automated lineage and monitoring capabilities.
Superwise.ai: Provides monitoring and governance solutions specifically for AI applications. Superwise creates model context through automation and insights so that data scientists, ML engineers, and business operations know when something goes wrong in the real world without alert fatigue or management trust issues so you can focus on continuously building newer, better models.
ML Experiment Tracking and Deployment Tools
Weights & Biases: Used for tracking experiments in machine learning, including performance metrics and model parameters. Performance visualization tools for machine learning. It helps companies turn deep learning research projects into deployed software by helping teams track their models, visualize model performance, and easily automate training and improving models.
MLflow: An open-source platform to manage the machine learning lifecycle, including experimentation, reproducibility, and deployment. Features components to monitor your model during training and running, ability to store models, load the model in production code and create a pipeline.
Open Source Model Repository Tools
Git: Though not specialized for AI, version control systems like Git can be essential for tracking changes, thereby assisting in audits. Audit-AI is a tool to measure and mitigate the effects of discriminatory patterns in training data and the predictions made by machine learning algorithms trained for the purposes of socially sensitive decision processes.
ModelDB: An open-source system to manage machine learning models, allowing tracking of different model versions and parameters. A curated database of published models in the broad domain of computational neuroscience. It addresses the need for access to such models to evaluate their validity and extend their use.
Summary
To meet the goal of building responsible and trustworthy AI, organizations will need to leverage best-in-class tools to not only make their AI strategy auditable but also maximize efficiency in the current fast-paced environment. Without auditable AI, your organization will not be able to verify that it has implemented its governance framework and strategy. Without the right software tools, your organization is liable to fall behind in its consumption, development, and deployment of AI algorithms. These fallbacks can expose your organization to numerous ethical, financial, and security risks.
Implementing an AI governance framework and strategy is a complex undertaking that is further complicated by the need to work with multiple software tools to ensure best-in-class efficiency and auditability. Furthermore, the procurement, implementation, and selection of tools can be a daunting task for organizations.
Fairo is designed to help you navigate these complexities. In addition to providing solutions in many of the categories detailed above, our system sits on top of your existing infrastructure and ecosystem. We provide a window of observability and expertise into all your systems as they relate to responsible AI consumption, development, and deployment.
How Can Fairo Help?
Fairo is a SaaS platform focused on strategy, operations, and governance to give organizations and their users the confidence to adopt AI successfully and rapidly at scale. Fairo is committed to being the industry-standard platform for helping your organization implement its AI strategy and governance framework. Fairo seamlessly integrates into your existing ecosystem and is easy to consume.
AI is a disruptive technology that will change how we work and live. We envision a world where AI is universally built responsibly, trusted, and not feared. We aim to provide an easy-to-use solution that helps organizations procure, develop, and deploy trustworthy AI solutions with confidence.