You can also use create_table
without providing a dataframe, and then later populate the feature table using fe.write_table
.
Example:
fe.create_table(
name=table_name,
primary_keys=["wine_id"],
schema=features_df.schema,
description="wine features"
)
fe.write_table(
name=table_name,
df=features_df,
mode="merge"
)
The feature table does not include the prediction target. However, the training dataset needs the prediction target values. There may also be features that are not available until the time the model is used for inference.
This example uses the feature real_time_measurement
to represent a characteristic of the wine that can only be observed at inference time. This feature is used in training and the feature value for a wine is provided at inference time.
The code in the next cell trains a scikit-learn RandomForestRegressor model and logs the model with the Feature Engineering in UC.
The code starts an MLflow experiment to track training parameters and results. Note that model autologging is disabled (mlflow.sklearn.autolog(log_models=False)
); this is because the model is logged using fe.log_model
.
To view the logged model, navigate to the MLflow Experiments page for this notebook. To access the Experiments page, click the Experiments icon on the left navigation bar:
Find the notebook experiment in the list. It has the same name as the notebook, in this case, "Basic example for Feature Engineering in Unity Catalog".
Click the experiment name to display the experiment page. The packaged Feature Engineering in UC model, created when you called fe.log_model
appears in the Artifacts section of this page. You can use this model for batch scoring.
data:image/s3,"s3://crabby-images/c292a/c292aaf39858e82a2ea204268f846d889ed94407" alt=""
The model is also automatically registered in the Unity Catalog.
Batch scoring
Use score_batch
to apply a packaged Feature Engineering in UC model to new data for inference. The input data only needs the primary key column wine_id
and the realtime feature real_time_measurement
. The model automatically looks up all of the other feature values from the feature tables.
Control permissions for and delete feature tables
- To control who has access to a Unity Catalog feature table, use the Permissions button on the Catalog Explorer table details page.
- To delete a Unity Catalog feature table, click the kebab menu on the Catalog Explorer table details page and select Delete. When you delete a Unity Catalog feature table using the UI, the corresponding Delta table is also deleted.
Basic example for Feature Engineering in Unity Catalog
This notebook illustrates how you can use Databricks Feature Engineering in Unity Catalog to create, store, and manage Unity Catalog Features to train ML models and make batch predictions, including with features whose value is only available at the time of prediction. In this example, the goal is to predict the wine quality using a ML model with a variety of static wine features and a realtime input.
This notebook shows how to:
Requirements
%pip install databricks-feature-engineering
at the start of this notebook.