Example: Structured data extraction & batch inference
This notebook demonstrates the development, logging, and evaluation of a simple agent for structured data extraction. While the agent implemented is rather simple, the approach demonstrates how to implement custom and thus arbitrarily complex agents for batch inference using MLflow's PythonModel
class.
This example showcases the application of a custom agent in batch inference across a set of unstructured documents. The process illustrates how to effectively transform raw, unstructured data into organized, actionable information through automated extraction techniques.
This notebooks shows how to leverage Mosaic AI Agent Evaluation (AWS | Azure) to evaluate the accuracy if ground truth data is available.
Define the extraction agent
Below, define your agent code in a single cell. This enables you to easily write it to a local Python file for subsequent logging and deployment using the %%writefile
magic command.
The extraction agent implements MLflow's PythonModel interface which can then be easily used as a Spark User-Defined Function (UDF) for batch inference.
Log the agent as an MLflow Model
Log the agent as code from the extractor.py
file. See MLflow Models from Code.
Batch inference & evaluation
To assess the agent's performance, create a simulated dataset of employment contracts. This dummy dataset will serve as a testbed for entity extraction, focusing on key information such as employer and employee names.
This notebook then utilizes this dataset to conduct batch inference testing, employing the logged agent model as a Spark User-Defined Function (UDF). This approach allows us to evaluate the agent's effectiveness in processing multiple documents simultaneously and extracting relevant entities at scale.
Tracing
Note: In the next cell, pass the run_id
of the active experiment run to the Spark UDF. The run_id
is used by the agent model to log the traces. Navigate to the MLflow experiment run to inspect the traces for each LLM request. In addition to the full request and response, these contain the token statistics defined in the agent model.
Evaluate the agent with Agent Evaluation
To assess the agent's quality, use the Agent Evaluation framework (AWS | Azure). This approach employs a correctness judge to compare expected entities (or facts) with the actual response, providing a comprehensive evaluation of the agent's performance.
Note: An alternative approach would be to compute metrics such as recall
and precision
for individual entities, though this would require additional data transformations or custom metrics.
Next steps
If the evaluation is successful, the next step would be to register the model in Unity Catalog for use in production. For more information about deploying generative AI and machine learning models to production, refer to the Big Book of MLOps.
For further insights and related examples of structured data extraction on Databricks, consider exploring these comprehensive technical blog posts: