automl-classification-example(Python)

Loading...

AutoML classification example

Requirements

Databricks Runtime for Machine Learning.

Census income dataset

This dataset contains census data from the 1994 census database. Each row represents a group of individuals. The goal is to determine whether a group has an income of over 50k a year or not. This classification is represented as a string in the income column with values <=50K or >50k.

Train/test split

Training

The following command starts an AutoML run. You must provide the column that the model should predict in the target_col argument.
When the run completes, you can follow the link to the best trial notebook to examine the training code. This notebook also includes a feature importance plot.

The following command displays information about the AutoML output.

    Next steps

    • Explore the notebooks and experiments linked above.
    • If the metrics for the best trial notebook look good, skip directly to the inference section.
    • If you want to improve on the model generated by the best trial:
      • Go to the notebook with the best trial and clone it.
      • Edit the notebook as necessary to improve the model. For example, you might try different hyperparameters.
      • When you are satisfied with the model, note the URI where the artifact for the trained model is logged. Assign this URI to the model_uri variable in Cmd 12.

    Inference

    You can use the model trained by AutoML to make predictions on new data. The examples below demonstrate how to make predictions on data in pandas DataFrames, or register the model as a Spark UDF for prediction on Spark DataFrames.

    pandas DataFrame

    Spark DataFrame

    Test

    Use the final model to make predictions on the holdout test set to estimate how the model would perform in a production setting. The diagram shows the breakdown between correct and incorrect predictions.