Intro
Monitoring the outputs of your AI models is an essential part of the lifecycle, equally important to model training. AI monitoring can help detect bias, detect data drift, and evaluate performance to reduce adverse impacts to your business and to your customers.
This is crucial because all machine learning models encounter some degree of bias, drift, and performance degradation when deployed. Overtime, this degradation is accentuated. A recent study from MIT, Harvard, and the University of Monterrey noted that 91% of machine learning models degrade over time.
If your organization is managing many different models and AI use cases. It can be hard to consistently implement best practices for risk, compliance, and quality management from a monitoring standpoint. Fairo’s AI Governance platform makes it easy to make sure you have the appropriate monitoring configured for your specific model and use case, across your technology stack.
In this tutorial, we will demonstrate how Fairo streamlines the configuration of Monitors on your AI/ML models to help reduce bias, drift, and maintain performance. In 10 minutes, you will be able to create and set up a bespoke monitor for your AI models.
Databricks Inference Tables
If you are a Databricks customer, you have a wealth of tools at your disposal to efficiently monitor and evaluate your AI models. One such tool is the inference table. Every model in Databricks can have an inference table linked to it, to track the predictions from your model, along with the features and protected class information from your production data.
Inference tables are a powerful tool because they can be monitored and refreshed automatically and incrementally, saving time and compute resources.
Connecting Databricks to Fairo
For an in-depth guide, open the guide in the Databricks plugin page in Organization/Integrations. This guide will explain how to create a service principle, with the correct permissions, add the ID and Secret to Fairo and verify the connection securely.
Setting Databricks as the MLFlow Tracking Server
Fairo uses MLFlow as its primary tracking server and model registry. We also offer a Fairo-hosted, fully-managed, MLFlow server to our users that do not have an MLOps platform currently or are looking for a lower-cost alternative.
Every Databricks workspace has a MLFlow server as well, and this is the primary interface we use to connect to the model registry
In Organization/Integrations select the MLFlow integration and set Databricks as the tracking server. Also, if you selectAuto-Log metrics to On, the metrics you log with your experiments will be loaded into Fairo automatically and can be linked to your models, tests, and evaluations.
Build Example in Databricks
For those who would like to follow along with an example, we’re using the MLOps end-to-end churn model from Databricks in this tutorial.
Setup
To get started with this example in your workspace, simply run the following code in your Databricks notebook:
import dbdemos
dbdemos.install(mlops-end2end, catalog=’<add your catalog>’, schema=’<add your schema>’)
This will create the notebook / load the datasets you need to proceed.
Run the Notebooks and Create Inference Table
Run all the notebooks in mlops-end2end/01-mlops-quickstart.
At the bottom of the notebook mlops-end2end/01-mlops-quickstart/ 05_batch_inference add and run the following code to create and save the inference table.
target_table = "fairo.mlops_example.mlops_churn_bronze_customers_inference"
# Random generators for protected classes for fairness evaluation
@F.udf(returnType=StringType())
def random_ethnicity():
return random.randint(1, 6)
@F.udf(returnType=IntegerType())
def random_education():
return random.randint(1, 3)
# Convert outcome from yes/no to 1/0
@F.udf(returnType=IntegerType())
def convert_outcome_to_int(x):
return (x == 'Yes') * 1
# create inference dataframe
inference_df = (preds_df
# Add random education level
.withColumn("education_level", random_education())
# Add random ethnicity
.withColumn("ethnicity", random_ethnicity())
# adding observed outcome int field
.withColumn('observed_outcome', convert_outcome_to_int(F.col("churn")))
# adding predicted outcome int field
.withColumn("predicted_outcome", convert_outcome_to_int(F.col("predictions"))))
# adding timestamp
.withColumn("timestamp", F.current_timestamp())
# adding model id
.withColumn("model_id", F.lit(f"{catalog}.{schema}.mlops_churn"))
)
inference_df.write.mode("overwrite").option("overwriteSchema", "true").saveAsTable(target_table)
Creating a Fairo Model Asset from a Databricks Model
After running the notebooks and setting up the tables, make sure that the model is registered from the winning experiment Databricks. The model should be saved in the catalog and schema that you selected.
To manually load the model into Fairo (Fairo can also automatically find models, using resource discovery, which we won't cover in this post) navigate to the models' page, click create a new model.
Select MLFlow as the model source, select your schema, and model ID from the dropdown.
Here is where you can configure details about the model, including the base model (if applicable),type, architecture, supported languages, citations/references, license, model card, lifecycle stage, and inference table.
To add the inference table, just click the button and select the table we created from the previous step.
Important: Be sure to set the lifecycle stage to Implementation and Deployment if you are generating inferences in production for customers, as this will trigger a number of AI-posture related tests from Fairo that will make sure your model is behaving as expected.
After saving, you will see the tests badge appear on the left-hand side of the model details. These tests are Fairo’s AI posture tests, that serve a variety of quality, risk, and compliance functions.
Clicking the test badge will bring you the evidence test screen where we are evaluating table monitors for all of your models that are in the Implementation and Deployment lifecycle stage.
If the test fails for any of your models, a 'Create Monitor' button will appear. Clicking it will help you easily configure the monitor from within Fairo’s UI.
To make the monitor configuration even easier, Fairo’s AI agent can detect which fields store predictions, timestamp, model ID, as well as what type of problem the model is solving (this is saved on the model card too), and which features can be used to evaluate the model for bias.
The agent will not create any monitors for you, just fill in the requested fields in the modal for you to review. Note that the action will not read your data, it will just read the schema, and guess the fields and slicing expressions based on the available information including column name, type, length and description.
If you are satisfied with the monitor configuration, click create monitor and Fairo will call the Databricks API to set it up automatically.
Once the test is complete, the test status will update to 'OK.' You now have a monitor setup, and we will continuously make sure that the monitor is in place and working.
The results of these tests are associated with a variety of controls in the platform, and can provide necessary evidence for frameworks and laws including Colorado's Life Insurance law and AI Act, ISO 42001, NIST AI Risk Management Framework, and EU AI Act.
Next Steps
With Fairo, you don’t need to worry about making sure your model has the appropriate monitoring set up. Based on a variety of data we capture in our platform, including intended uses, impacted party personas, and technical information, we are able to ensure your AI monitoring posture is consistently in line with best risk, quality, and regulatory practices.
Now you know how to set up inference tables for your models and monitor them effectively, we can start to go into deeper detail on testing and evaluating our AI models. We will be publishing follow-ups to this tutorial in the coming weeks that will show you how to test your AI agents and reduce hallucinations like a pro. Even if your organization is new to AI, Fairo will make sure you operate like a seasoned professional across all aspects of AI governance, risk, quality, and compliance.