Forecasting (serverless) with AutoML
Preview
This feature is in Public Preview.
This article shows you how to run a serverless forecasting experiment using the Mosaic AI Model Training UI.
Mosaic AI Model Training - forecasting simplifies forecasting time-series data by automatically selecting the best algorithm and hyperparameters, all while running on fully-managed compute resources.
To understand the difference between serverless forecasting and classic compute forecasting, see Serverless forecasting vs. classic compute forecasting.
Requirements
Training data with a time series column, saved as a Unity Catalog table.
If the workspace has Secure Egress Gateway (SEG) enabled,
pypi.org
must be added to the Allowed domains list. See Managing network policies for serverless egress control.
Create a forecasting experiment with the UI
Go to your Databricks landing page and click Experiments in the sidebar.
In the Forecasting tile, select Start training.
Select the Training data from a list of Unity Catalog tables that you can access.
Time column: Select the column containing the time periods for the time series. The columns must be of type
timestamp
ordate
.Forecast frequency: Select the time unit that represents your input data’s frequency. For example, minutes, hours, days, months. This determines the granularity of your time series.
Forecast horizon: Specify how many units of the selected frequency to forecast into the future. Together with the forecast frequency, this defines both the time units and the number of time units to forecast.
Note
To use the Auto-ARIMA algorithm, the time series must have a regular frequency where the interval between any two points must be the same throughout the time series. AutoML handles missing time steps by filling in those values with the previous value.
Select a Prediction target column that you want the model to predict.
Optionally, specify a Unity Catalog table Prediction data path to store the output forecasts.
Select a Model registration Unity Catalog location and name.
Optionally, set Advanced options:
Experiment name: Provide an MLflow experiment name.
Time series identifier columns - For multi-series forecasting, select the column(s) that identify the individual time series. Databricks groups the data by these columns as different time series and trains a model for each series independently.
Primary metric: Choose the primary metric used to evaluate and select the best model.
Training framework: Choose the frameworks for AutoML to explore.
Split column: Select the column containing custom data split. Values must be “train” , “validate” , “test”
Weight column: Specify the column to use for weighting time series. All samples for a given time series must have the same weight. The weight must be in the range [0, 10000].
Holiday region: Select the holiday region to use as covariates in model training.
Timeout: Set a maximum duration for the AutoML experiment.
Run the experiment and monitor the results
To start the AutoML experiment, click Start training. From the experiment training page, you can do the following:
Stop the experiment at any time.
Monitor runs.
Navigate to the run page for any run.
View results or use the best model
After training completes, the prediction results are stored in specified Delta table and the best model is registered to Unity Catalog.
From the experiments page, you choose from the following next steps:
Select View predictions to see the forecasting results table.
Select Batch inference notebook to open a auto-generated notebook for batch inferencing using the best model.
Select Create serving endpoint to deploy the best model to a Model Serving endpoint.
Serverless forecasting vs. classic compute forecasting
The following table summarizes the differences between serverless forecasting and forecasting with classic compute
Feature |
Serverless forecasting |
Classic compute forecasting |
---|---|---|
Compute infrastructure |
Databricks manages compute configuration and automatically optimizes for cost and performance. |
User-configured compute |
Governance |
Models and artifacts registered to Unity Catalog |
User-configured workspace file store |
Algorithm selection |
Statistical models plus the deep learning neural net algorithm DeepAR |
|
Feature store integration |
Not supported |
|
Auto-generated notebooks |
Batch inference notebook |
Source code for all trials |
One-click model serving deployment |
Supported |
Unsupported |
Custom train/validate/test splits |
Supported |
Not supported |
Custom weights for individual time series |
Supported |
Not supported |