Feature Store Time Series Feature Table

In this notebook, you create time series feature tables based on simulated Internet of Things (IoT) sensor data. You then:

Generate a training set by performing a point-in-time lookup on the time series feature tables.
Use the training set to train a model.
Register the model.
Perform batch inference on new sensor data.

Requirements

This notebook is intended for workspaces that are not enabled for Unity Catalog. If your workspace is enabled for Unity Catalog, use the version of this notebook designed for Unity Catalog. (AWS | Azure | GCP).
Databricks Runtime 10.4 LTS for Machine Learning or above.

Note: Starting with Databricks Runtime 13.2 ML, a change was made to the create_table API. Timestamp key columns must now be specified in the primary_keys argument. If you are using this notebook with Databricks Runtime 13.1 ML or below, use the commented-out code for the create_table call in Cmd 9.

Database name: point_in_time_demo_50316e10366349078610bfb4b3bd28dd Model name: pit_demo_model_50316e10366349078610bfb4b3bd28dd

DataFrame[]

Table

2024/08/11 22:32:12 INFO databricks.ml_features._compute_client._compute_client: Created feature table 'hive_metastore.point_in_time_demo_50316e10366349078610bfb4b3bd28dd.temp_sensors'. 2024/08/11 22:32:29 INFO databricks.ml_features._compute_client._compute_client: Created feature table 'hive_metastore.point_in_time_demo_50316e10366349078610bfb4b3bd28dd.light_sensors'. 2024/08/11 22:32:44 INFO databricks.ml_features._compute_client._compute_client: Created feature table 'hive_metastore.point_in_time_demo_50316e10366349078610bfb4b3bd28dd.co2_sensors'.

<FeatureTable: name='point_in_time_demo_50316e10366349078610bfb4b3bd28dd.co2_sensors', table_id='8a91fbbdf3614a22aed89e17439240bb', description='Readings from CO2 sensors', primary_keys=['r', 'co2_ts'], partition_columns=[], features=['co2_ts', 'ppm', 'r'], creation_timestamp=1723415553482, online_stores=[], notebook_producers=[notebook_id: 4161531970048598 revision_id: 1723415563747 creation_timestamp: 1723415564210 creator_id: "andrea.kress@databricks.com" notebook_workspace_id: 8498204313176882 feature_table_workspace_id: 8498204313176882 notebook_workspace_url: "https://db-sme-demo-docs.cloud.databricks.com" producer_action: CREATE ], job_producers=[], table_data_sources=[], path_data_sources=[], custom_data_sources=[], timestamp_keys=['co2_ts'], tags={}>

Table

2024/08/11 22:33:23 INFO mlflow.utils.autologging_utils: Created MLflow autologging run with ID '345c93ec0df845a58dfbd3a0b4a5dc1b', which will track hyperparameters, performance metrics, model artifacts, and lineage information for the current lightgbm workflow [LightGBM] [Info] Number of positive: 7021, number of negative: 36446 [LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001138 seconds. You can set `force_col_wise=true` to remove the overhead. [LightGBM] [Info] Total Bins 1020 [LightGBM] [Info] Number of data points in the train set: 43467, number of used features: 4 [LightGBM] [Info] [binary:BoostFromScore]: pavg=0.161525 -> initscore=-1.646926 [LightGBM] [Info] Start training from score -1.646926 2024/08/11 22:33:29 WARNING mlflow.utils.autologging_utils: MLflow autologging encountered a warning: "/databricks/python/lib/python3.10/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils."

Uploading artifacts: 0%| | 0/5 [00:00<?, ?it/s]

2024/08/11 22:33:36 WARNING mlflow.models.model: Model logged without a signature. Signatures will be required for upcoming model registry features as they validate model inputs and denote the expected schema of model outputs. Please visit https://www.mlflow.org/docs/2.9.2/models.html#set-signature-on-logged-model for instructions on setting a model signature on your logged model.

Uploading artifacts: 0%| | 0/10 [00:00<?, ?it/s]

2024/08/11 22:33:36 INFO mlflow.store.artifact.cloud_artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false Successfully registered model 'pit_demo_model_50316e10366349078610bfb4b3bd28dd'. 2024/08/11 22:33:38 INFO mlflow.store.model_registry.abstract_store: Waiting up to 300 seconds for model version to finish creation. Model name: pit_demo_model_50316e10366349078610bfb4b3bd28dd, version 1 Created version '1' of model 'pit_demo_model_50316e10366349078610bfb4b3bd28dd'.

Downloading artifacts: 0%| | 0/10 [00:00<?, ?it/s]

2024/08/11 22:33:46 INFO mlflow.store.artifact.artifact_repo: The progress bar can be disabled by setting the environment variable MLFLOW_ENABLE_ARTIFACTS_PROGRESS_BAR to false

Downloading artifacts: 0%| | 0/5 [00:00<?, ?it/s]

2024/08/11 22:33:47 WARNING mlflow.pyfunc: Calling `spark_udf()` with `env_manager="local"` does not recreate the same environment that was used during training, which may lead to errors or inaccurate predictions. We recommend specifying `env_manager="conda"`, which automatically recreates the environment that was used to train the model and performs inference in the recreated environment.

Downloading artifacts: 0%| | 0/1 [00:00<?, ?it/s]

2024/08/11 22:33:47 INFO mlflow.models.flavor_backend_registry: Selected backend for flavor 'python_function'

Table

feature-store-time-series-example(Python)

Feature Store Time Series Feature Table

Requirements

Background

Generate the simulated dataset

Create the time series feature tables

Updating the time-series feature tables

Create a training set with point-in-time lookups on time series feature tables

Train the model

Score data with point-in-time lookups on time series feature tables