automl-forecasting-example

import pyspark.pandas as psdf = ps.read_csv("/databricks-datasets/COVID/covid-19-data")df["date"] = ps.to_datetime(df['date'], errors='coerce')df["cases"] = df["cases"].astype(int)display(df)

AutoML training

The following command starts an AutoML run. You must provide the column that the model should predict in the target_col argument and the time column. When the run completes, you can follow the link to the best trial notebook to examine the training code.

This example also specifies:

horizon=30 to specify that AutoML should forecast 30 days into the future.
frequency="d" to specify that a forecast should be provided for each day.
primary_metric="mdape" to specify the metric to optimize for during training.

import databricks.automl
import logging

# Disable informational messages from fbprophet
logging.getLogger("py4j").setLevel(logging.WARNING)

# Note: If you are running Databricks Runtime for Machine Learning 10.4 or below, use this line instead:
# summary = databricks.automl.forecast(df, target_col="cases", time_col="date", horizon=30, frequency="d",  primary_metric="mdape")

summary = databricks.automl.forecast(df, target_col="cases", time_col="date", horizon=30, frequency="d",  primary_metric="mdape", output_database="default")

# Load the saved predictions.
forecast_pd = spark.table(summary.output_table_name)
display(forecast_pd)

import mlflow.pyfunc
from mlflow.tracking import MlflowClient

run_id = MlflowClient()
trial_id = summary.best_trial.mlflow_run_id

model_uri = "runs:/{run_id}/model".format(run_id=trial_id)
pyfunc_model = mlflow.pyfunc.load_model(model_uri)

forecasts = pyfunc_model._model_impl.python_model.predict_timeseries()
display(forecasts)

# Option for Databricks Runtime for Machine Learning 10.5 or above
# forecasts = pyfunc_model._model_impl.python_model.predict_timeseries(include_history=False)

df_true = df.groupby("date").agg(y=("cases", "avg")).reset_index().to_pandas()

import matplotlib.pyplot as plt

fig = plt.figure(facecolor='w', figsize=(10, 6))
ax = fig.add_subplot(111)
forecasts = pyfunc_model._model_impl.python_model.predict_timeseries(include_history=True)
fcst_t = forecasts['ds'].dt.to_pydatetime()
ax.plot(df_true['date'].dt.to_pydatetime(), df_true['y'], 'k.', label='Observed data points')
ax.plot(fcst_t, forecasts['yhat'], ls='-', c='#0072B2', label='Forecasts')
ax.fill_between(fcst_t, forecasts['yhat_lower'], forecasts['yhat_upper'],
                color='#0072B2', alpha=0.2, label='Uncertainty interval')
ax.legend()
plt.show()

automl-forecasting-example(Python)

AutoML forecasting example

Requirements

COVID-19 dataset

AutoML training

Next steps

Show the predicted results from the best model

Load predictions from the best model

Use the model for forecasting

Load the model with MLflow

Use the model to make forecasts

Plot the forecasted points