automl-usage-example(Python)
Loading...

AutoML Usage Example

Requirements

Cluster running Databricks Runtime 8.0 ML or above.

Install AutoML

%pip install "https://ml-team-public-read.s3.amazonaws.com/wheels/automl/e4ca550b-2b31-45d1-bc08-bf4f71b90502/databricks_automl-0.1.dev0-py3-none-any.whl"
Python interpreter will be restarted. Collecting databricks-automl==0.1.dev0 Downloading https://ml-team-public-read.s3.amazonaws.com/wheels/automl/e4ca550b-2b31-45d1-bc08-bf4f71b90502/databricks_automl-0.1.dev0-py3-none-any.whl (53 kB) Requirement already satisfied: mlflow in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from databricks-automl==0.1.dev0) (1.13.1) Collecting nbformat==5.0.4 Downloading nbformat-5.0.4-py3-none-any.whl (169 kB) Collecting ipywidgets==7.6.3 Downloading ipywidgets-7.6.3-py2.py3-none-any.whl (121 kB) Requirement already satisfied: pyarrow in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from databricks-automl==0.1.dev0) (1.0.1) Collecting ipykernel==5.1.4 Downloading ipykernel-5.1.4-py3-none-any.whl (116 kB) Requirement already satisfied: scikit-learn==0.23.2 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from databricks-automl==0.1.dev0) (0.23.2) Requirement already satisfied: xgboost in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from databricks-automl==0.1.dev0) (1.3.3) Requirement already satisfied: numpy in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from databricks-automl==0.1.dev0) (1.19.2) Collecting requests==2.22.0 Downloading requests-2.22.0-py2.py3-none-any.whl (57 kB) Requirement already satisfied: jinja2==2.11.2 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from databricks-automl==0.1.dev0) (2.11.2) Requirement already satisfied: pandas in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from databricks-automl==0.1.dev0) (1.1.3) Requirement already satisfied: wrapt==1.12.1 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from databricks-automl==0.1.dev0) (1.12.1) Requirement already satisfied: matplotlib in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from databricks-automl==0.1.dev0) (3.2.2) Requirement already satisfied: shap in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from databricks-automl==0.1.dev0) (0.38.1) Collecting nbconvert==5.6.1 Downloading nbconvert-5.6.1-py2.py3-none-any.whl (455 kB) Collecting jupyter-client==5.3.4 Downloading jupyter_client-5.3.4-py2.py3-none-any.whl (92 kB) Requirement already satisfied: querystring-parser in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from mlflow->databricks-automl==0.1.dev0) (1.2.4) Requirement already satisfied: entrypoints in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from mlflow->databricks-automl==0.1.dev0) (0.3) Requirement already satisfied: protobuf>=3.6.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from mlflow->databricks-automl==0.1.dev0) (3.13.0) Requirement already satisfied: six>=1.10.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from mlflow->databricks-automl==0.1.dev0) (1.15.0) Requirement already satisfied: gunicorn; platform_system != "Windows" in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from mlflow->databricks-automl==0.1.dev0) (20.0.4) Requirement already satisfied: python-dateutil in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from mlflow->databricks-automl==0.1.dev0) (2.8.1) Requirement already satisfied: sqlparse>=0.3.1 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from mlflow->databricks-automl==0.1.dev0) (0.4.1) Collecting sqlalchemy Downloading SQLAlchemy-1.3.23-cp38-cp38-manylinux2010_x86_64.whl (1.3 MB) Requirement already satisfied: pyyaml in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from mlflow->databricks-automl==0.1.dev0) (5.4.1) Requirement already satisfied: databricks-cli>=0.8.7 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from mlflow->databricks-automl==0.1.dev0) (0.14.1) Collecting alembic<=1.4.1 Downloading alembic-1.4.1.tar.gz (1.1 MB) Requirement already satisfied: click>=7.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from mlflow->databricks-automl==0.1.dev0) (7.1.2) Requirement already satisfied: azure-storage-blob>=12.0.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from mlflow->databricks-automl==0.1.dev0) (12.7.1) Collecting prometheus-flask-exporter Downloading prometheus_flask_exporter-0.18.1.tar.gz (21 kB) Requirement already satisfied: docker>=4.0.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from mlflow->databricks-automl==0.1.dev0) (4.4.1) Requirement already satisfied: Flask in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from mlflow->databricks-automl==0.1.dev0) (1.1.2) Requirement already satisfied: gitpython>=2.1.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from mlflow->databricks-automl==0.1.dev0) (3.1.12) Requirement already satisfied: cloudpickle in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from mlflow->databricks-automl==0.1.dev0) (1.6.0) Requirement already satisfied: traitlets>=4.1 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from nbformat==5.0.4->databricks-automl==0.1.dev0) (5.0.5) Collecting jsonschema!=2.5.0,>=2.4 Downloading jsonschema-3.2.0-py2.py3-none-any.whl (56 kB) Requirement already satisfied: jupyter-core in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from nbformat==5.0.4->databricks-automl==0.1.dev0) (4.6.3) Requirement already satisfied: ipython-genutils in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from nbformat==5.0.4->databricks-automl==0.1.dev0) (0.2.0) Collecting widgetsnbextension~=3.5.0 Downloading widgetsnbextension-3.5.1-py2.py3-none-any.whl (2.2 MB) Collecting jupyterlab-widgets>=1.0.0; python_version >= "3.6" Downloading jupyterlab_widgets-1.0.0-py3-none-any.whl (243 kB) Requirement already satisfied: ipython>=4.0.0; python_version >= "3.3" in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from ipywidgets==7.6.3->databricks-automl==0.1.dev0) (7.19.0) Requirement already satisfied: tornado>=4.2 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from ipykernel==5.1.4->databricks-automl==0.1.dev0) (6.0.4) Requirement already satisfied: threadpoolctl>=2.0.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from scikit-learn==0.23.2->databricks-automl==0.1.dev0) (2.1.0) Requirement already satisfied: scipy>=0.19.1 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from scikit-learn==0.23.2->databricks-automl==0.1.dev0) (1.5.2) Requirement already satisfied: joblib>=0.11 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from scikit-learn==0.23.2->databricks-automl==0.1.dev0) (0.17.0) Requirement already satisfied: certifi>=2017.4.17 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from requests==2.22.0->databricks-automl==0.1.dev0) (2020.12.5) Requirement already satisfied: chardet<3.1.0,>=3.0.2 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from requests==2.22.0->databricks-automl==0.1.dev0) (3.0.4) Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from requests==2.22.0->databricks-automl==0.1.dev0) (1.25.11) Collecting idna<2.9,>=2.5 Downloading idna-2.8-py2.py3-none-any.whl (58 kB) Requirement already satisfied: MarkupSafe>=0.23 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from jinja2==2.11.2->databricks-automl==0.1.dev0) (1.1.1) Requirement already satisfied: pytz>=2017.2 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from pandas->databricks-automl==0.1.dev0) (2020.5) Requirement already satisfied: kiwisolver>=1.0.1 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from matplotlib->databricks-automl==0.1.dev0) (1.3.0) Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from matplotlib->databricks-automl==0.1.dev0) (2.4.7) Requirement already satisfied: cycler>=0.10 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from matplotlib->databricks-automl==0.1.dev0) (0.10.0) Requirement already satisfied: slicer==0.0.7 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from shap->databricks-automl==0.1.dev0) (0.0.7) Requirement already satisfied: numba in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from shap->databricks-automl==0.1.dev0) (0.52.0) Requirement already satisfied: tqdm>4.25.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from shap->databricks-automl==0.1.dev0) (4.50.2) Collecting bleach Downloading bleach-3.3.0-py2.py3-none-any.whl (283 kB) Collecting mistune<2,>=0.8.1 Downloading mistune-0.8.4-py2.py3-none-any.whl (16 kB) Collecting pandocfilters>=1.4.1 Downloading pandocfilters-1.4.3.tar.gz (16 kB) Requirement already satisfied: pygments in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from nbconvert==5.6.1->databricks-automl==0.1.dev0) (2.7.2) Collecting defusedxml Downloading defusedxml-0.6.0-py2.py3-none-any.whl (23 kB) Collecting testpath Downloading testpath-0.4.4-py2.py3-none-any.whl (163 kB) Requirement already satisfied: pyzmq>=13 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from jupyter-client==5.3.4->databricks-automl==0.1.dev0) (19.0.2) Requirement already satisfied: setuptools in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from protobuf>=3.6.0->mlflow->databricks-automl==0.1.dev0) (50.3.1.post20201107) Requirement already satisfied: tabulate>=0.7.7 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from databricks-cli>=0.8.7->mlflow->databricks-automl==0.1.dev0) (0.8.7) Requirement already satisfied: Mako in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from alembic<=1.4.1->mlflow->databricks-automl==0.1.dev0) (1.1.3) Requirement already satisfied: python-editor>=0.3 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from alembic<=1.4.1->mlflow->databricks-automl==0.1.dev0) (1.0.4) Requirement already satisfied: msrest>=0.6.18 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from azure-storage-blob>=12.0.0->mlflow->databricks-automl==0.1.dev0) (0.6.20) Requirement already satisfied: cryptography>=2.1.4 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from azure-storage-blob>=12.0.0->mlflow->databricks-automl==0.1.dev0) (3.1.1) Requirement already satisfied: azure-core<2.0.0,>=1.10.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from azure-storage-blob>=12.0.0->mlflow->databricks-automl==0.1.dev0) (1.10.0) Collecting prometheus_client Downloading prometheus_client-0.9.0-py2.py3-none-any.whl (53 kB) Requirement already satisfied: websocket-client>=0.32.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from docker>=4.0.0->mlflow->databricks-automl==0.1.dev0) (0.57.0) Requirement already satisfied: itsdangerous>=0.24 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from Flask->mlflow->databricks-automl==0.1.dev0) (1.1.0) Requirement already satisfied: Werkzeug>=0.15 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from Flask->mlflow->databricks-automl==0.1.dev0) (1.0.1) Requirement already satisfied: gitdb<5,>=4.0.1 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from gitpython>=2.1.0->mlflow->databricks-automl==0.1.dev0) (4.0.5) Requirement already satisfied: attrs>=17.4.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from jsonschema!=2.5.0,>=2.4->nbformat==5.0.4->databricks-automl==0.1.dev0) (20.3.0) Collecting pyrsistent>=0.14.0 Downloading pyrsistent-0.17.3.tar.gz (106 kB) Collecting notebook>=4.4.1 Downloading notebook-6.2.0-py3-none-any.whl (9.5 MB) Requirement already satisfied: prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets==7.6.3->databricks-automl==0.1.dev0) (3.0.8) Requirement already satisfied: decorator in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets==7.6.3->databricks-automl==0.1.dev0) (4.4.2) Requirement already satisfied: pexpect>4.3; sys_platform != "win32" in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets==7.6.3->databricks-automl==0.1.dev0) (4.8.0) Requirement already satisfied: backcall in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets==7.6.3->databricks-automl==0.1.dev0) (0.2.0) Requirement already satisfied: pickleshare in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets==7.6.3->databricks-automl==0.1.dev0) (0.7.5) Requirement already satisfied: jedi>=0.10 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from ipython>=4.0.0; python_version >= "3.3"->ipywidgets==7.6.3->databricks-automl==0.1.dev0) (0.17.2) Requirement already satisfied: llvmlite<0.36,>=0.35.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from numba->shap->databricks-automl==0.1.dev0) (0.35.0) Requirement already satisfied: packaging in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from bleach->nbconvert==5.6.1->databricks-automl==0.1.dev0) (20.4) Collecting webencodings Downloading webencodings-0.5.1-py2.py3-none-any.whl (11 kB) Requirement already satisfied: requests-oauthlib>=0.5.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from msrest>=0.6.18->azure-storage-blob>=12.0.0->mlflow->databricks-automl==0.1.dev0) (1.3.0) Requirement already satisfied: isodate>=0.6.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from msrest>=0.6.18->azure-storage-blob>=12.0.0->mlflow->databricks-automl==0.1.dev0) (0.6.0) Requirement already satisfied: cffi!=1.11.3,>=1.8 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from cryptography>=2.1.4->azure-storage-blob>=12.0.0->mlflow->databricks-automl==0.1.dev0) (1.14.3) Requirement already satisfied: smmap<4,>=3.0.1 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from gitdb<5,>=4.0.1->gitpython>=2.1.0->mlflow->databricks-automl==0.1.dev0) (3.0.5) Collecting argon2-cffi Downloading argon2_cffi-20.1.0-cp35-abi3-manylinux1_x86_64.whl (97 kB) Collecting Send2Trash>=1.5.0 Downloading Send2Trash-1.5.0-py3-none-any.whl (12 kB) Collecting terminado>=0.8.3 Downloading terminado-0.9.2-py3-none-any.whl (14 kB) Requirement already satisfied: wcwidth in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from prompt-toolkit!=3.0.0,!=3.0.1,<3.1.0,>=2.0.0->ipython>=4.0.0; python_version >= "3.3"->ipywidgets==7.6.3->databricks-automl==0.1.dev0) (0.2.5) Requirement already satisfied: ptyprocess>=0.5 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from pexpect>4.3; sys_platform != "win32"->ipython>=4.0.0; python_version >= "3.3"->ipywidgets==7.6.3->databricks-automl==0.1.dev0) (0.6.0) Requirement already satisfied: parso<0.8.0,>=0.7.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from jedi>=0.10->ipython>=4.0.0; python_version >= "3.3"->ipywidgets==7.6.3->databricks-automl==0.1.dev0) (0.7.0) Requirement already satisfied: oauthlib>=3.0.0 in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from requests-oauthlib>=0.5.0->msrest>=0.6.18->azure-storage-blob>=12.0.0->mlflow->databricks-automl==0.1.dev0) (3.1.0) Requirement already satisfied: pycparser in /local_disk0/.ephemeral_nfs/envs/pythonEnv-ccd3a263-9f8d-4732-83a3-5b0e6a97d252/lib/python3.8/site-packages (from cffi!=1.11.3,>=1.8->cryptography>=2.1.4->azure-storage-blob>=12.0.0->mlflow->databricks-automl==0.1.dev0) (2.20) Building wheels for collected packages: alembic, prometheus-flask-exporter, pandocfilters, pyrsistent Building wheel for alembic (setup.py): started Building wheel for alembic (setup.py): finished with status 'done' Created wheel for alembic: filename=alembic-1.4.1-py2.py3-none-any.whl size=158155 sha256=9da80875bfe429dccf49c49946d15bbac3834f376f2b12bd520f67fbce3d3c39 Stored in directory: /home/root/.cache/pip/wheels/9d/de/6d/ca8d461ec29e010b1267d7353d0b058819770f7680bb9360e4 Building wheel for prometheus-flask-exporter (setup.py): started Building wheel for prometheus-flask-exporter (setup.py): finished with status 'done' Created wheel for prometheus-flask-exporter: filename=prometheus_flask_exporter-0.18.1-py3-none-any.whl size=17157 sha256=54c5c966a261d22b3e536c6b2472d1ee141fa8d2ab07009e078574efeb857a72 Stored in directory: /home/root/.cache/pip/wheels/12/1a/8d/0c016e06370d07f82def661b6cb7d91d4e6b4ff7f2982e9f2c Building wheel for pandocfilters (setup.py): started Building wheel for pandocfilters (setup.py): finished with status 'done' Created wheel for pandocfilters: filename=pandocfilters-1.4.3-py3-none-any.whl size=7991 sha256=e0e60e328ef1cc02211e0ea76c352bcc4176ad066d34396c366054452a1f04e8 Stored in directory: /home/root/.cache/pip/wheels/fc/39/52/8d6f3cec1cca4ceb44d658427c35711b19d89dbc4914af657f Building wheel for pyrsistent (setup.py): started Building wheel for pyrsistent (setup.py): finished with status 'done' Created wheel for pyrsistent: filename=pyrsistent-0.17.3-cp38-cp38-linux_x86_64.whl size=119735 sha256=b7b2919659da0b1a3c7a4b919b9a598d79c215ae2f3d8c9188deb381721e070b Stored in directory: /home/root/.cache/pip/wheels/3d/22/08/7042eb6309c650c7b53615d5df5cc61f1ea9680e7edd3a08d2 Successfully built alembic prometheus-flask-exporter pandocfilters pyrsistent Installing collected packages: pyrsistent, jsonschema, nbformat, webencodings, bleach, mistune, pandocfilters, defusedxml, testpath, nbconvert, jupyter-client, ipykernel, argon2-cffi, Send2Trash, terminado, prometheus-client, notebook, widgetsnbextension, jupyterlab-widgets, ipywidgets, idna, requests, databricks-automl, sqlalchemy, alembic, prometheus-flask-exporter Attempting uninstall: jupyter-client Found existing installation: jupyter-client 6.1.7 Uninstalling jupyter-client-6.1.7: Successfully uninstalled jupyter-client-6.1.7 Attempting uninstall: ipykernel Found existing installation: ipykernel 5.3.4 Uninstalling ipykernel-5.3.4: Successfully uninstalled ipykernel-5.3.4 Attempting uninstall: idna Found existing installation: idna 2.10 Uninstalling idna-2.10: Successfully uninstalled idna-2.10 Attempting uninstall: requests Found existing installation: requests 2.24.0 Uninstalling requests-2.24.0: Successfully uninstalled requests-2.24.0 ERROR: After October 2020 you may experience errors when installing or updating packages. This is because pip will change the way that it resolves dependency conflicts. We recommend you use --use-feature=2020-resolver to test your packages with the new resolver before it becomes the default. aiohttp 3.6.3 requires yarl<1.6.0,>=1.0, but you'll have yarl 1.6.3 which is incompatible. notebook 6.2.0 requires tornado>=6.1, but you'll have tornado 6.0.4 which is incompatible. Successfully installed Send2Trash-1.5.0 alembic-1.4.1 argon2-cffi-20.1.0 bleach-3.3.0 databricks-automl-0.1.dev0 defusedxml-0.6.0 idna-2.8 ipykernel-5.1.4 ipywidgets-7.6.3 jsonschema-3.2.0 jupyter-client-5.3.4 jupyterlab-widgets-1.0.0 mistune-0.8.4 nbconvert-5.6.1 nbformat-5.0.4 notebook-6.2.0 pandocfilters-1.4.3 prometheus-client-0.9.0 prometheus-flask-exporter-0.18.1 pyrsistent-0.17.3 requests-2.22.0 sqlalchemy-1.3.23 terminado-0.9.2 testpath-0.4.4 webencodings-0.5.1 widgetsnbextension-3.5.1 Python interpreter will be restarted.
import databricks.automl
import mlflow
import pandas
import sklearn.metrics
from pyspark.sql.types import DoubleType, StringType, StructField, StructType

Census Income Dataset

This dataset contains census data of the 1994 Census database. Each row represents a group of individuals. The goal is to determine whether a group has an income of over 50k a year or not. This classification is represented as a string in the income column with values <=50K or >50k.

schema = StructType([
  StructField("age", DoubleType(), False),
  StructField("workclass", StringType(), False),
  StructField("fnlwgt", DoubleType(), False),
  StructField("education", StringType(), False),
  StructField("education_num", DoubleType(), False),
  StructField("marital_status", StringType(), False),
  StructField("occupation", StringType(), False),
  StructField("relationship", StringType(), False),
  StructField("race", StringType(), False),
  StructField("sex", StringType(), False),
  StructField("capital_gain", DoubleType(), False),
  StructField("capital_loss", DoubleType(), False),
  StructField("hours_per_week", DoubleType(), False),
  StructField("native_country", StringType(), False),
  StructField("income", StringType(), False)
])
input_df = spark.read.format("csv").schema(schema).load("/databricks-datasets/adult/adult.data")

Train / Test Split

train_df, test_df = input_df.randomSplit([0.99, 0.01], seed=42)
display(train_df)
 
age
workclass
fnlwgt
education
education_num
marital_status
occupation
relationship
race
sex
capital_gain
capital_loss
hours_per_week
native_country
income
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
17
?
34019
10th
6
Never-married
?
Own-child
White
Male
0
0
20
United-States
<=50K
17
?
34088
12th
8
Never-married
?
Own-child
White
Female
0
0
25
United-States
<=50K
17
?
41643
11th
7
Never-married
?
Own-child
White
Female
0
0
15
United-States
<=50K
17
?
47407
11th
7
Never-married
?
Own-child
White
Male
0
0
10
United-States
<=50K
17
?
48703
11th
7
Never-married
?
Own-child
White
Female
0
0
30
United-States
<=50K
17
?
48751
11th
7
Never-married
?
Own-child
Black
Female
0
0
40
United-States
<=50K
17
?
67808
10th
6
Never-married
?
Own-child
White
Male
0
0
40
United-States
<=50K
17
?
80077
11th
7
Never-married
?
Own-child
White
Female
0
0
20
United-States
<=50K
17
?
86786
10th
6
Never-married
?
Own-child
White
Female
0
0
40
United-States
<=50K
17
?
89870
10th
6
Never-married
?
Own-child
White
Male
0
0
40
United-States
<=50K
17
?
94366
10th
6
Never-married
?
Other-relative
White
Male
0
0
6
United-States
<=50K
17
?
103810
12th
8
Never-married
?
Own-child
White
Male
0
0
40
United-States
<=50K
17
?
104025
11th
7
Never-married
?
Own-child
White
Male
0
0
18
United-States
<=50K
17
?
110998
Some-college
10
Never-married
?
Own-child
Asian-Pac-Islander
Female
0
0
40
Philippines
<=50K
17
?
112942
10th
6
Never-married
?
Own-child
White
Male
0
0
40
United-States
<=50K
17
?
114798
11th
7
Never-married
?
Own-child
White
Female
0
0
18
United-States
<=50K
17
?
127003
9th
5
Never-married
?
Own-child
Black
Male
0
0
40
United-States
<=50K
17
?
138507
10th
6
Never-married
?
Own-child
White
Male
0
0
20
United-States
<=50K

Showing the first 1000 rows.

Training

The following command starts an AutoML run. When the run completes, you can follow the link to the best trial notebook and training code as well as a feature importance plot.

summary = databricks.automl.classify(train_df, target_col='income', data_dir='dbfs:/automl/adult', timeout_minutes=30)
To see analysis of your data while training completes, open the data exploration notebook here: https://e2-dogfood.staging.cloud.databricks.com/?o=6051921418418893#notebook/96784558541322 ********************************************************************************************************** Trials for training the dataset have been kicked off. You can track the completed trials in the MLflow experiment here: https://e2-dogfood.staging.cloud.databricks.com/?o=6051921418418893#mlflow/experiments/96784558541319/s?orderByKey=metrics.%60val_f1_score%60&orderByAsc=false Notebooks that generate the trials can be edited to tweak the setup, add hyperparameters and re-run the trials. All re-run notebooks will log the trials under the same experiment. Generated notebooks contain instructions to load models from your favorite trials. ********************************************************************************************************** 0 0%| | 0/20 [00:00<?, ?trial/s, best loss=?] 5%|▌ | 1/20 [00:32<10:12, 32.25s/trial, best loss: -0.830889053903534] 15%|█▌ | 3/20 [00:39<06:41, 23.63s/trial, best loss: -0.830889053903534] 20%|██ | 4/20 [00:40<04:29, 16.84s/trial, best loss: -0.8590671548371503] 25%|██▌ | 5/20 [00:51<03:46, 15.09s/trial, best loss: -0.8590671548371503] 30%|███ | 6/20 [00:53<02:36, 11.16s/trial, best loss: -0.8590671548371503] 35%|███▌ | 7/20 [00:54<01:45, 8.12s/trial, best loss: -0.8590671548371503] 40%|████ | 8/20 [00:56<01:15, 6.28s/trial, best loss: -0.8590671548371503] 50%|█████ | 10/20 [00:57<00:45, 4.55s/trial, best loss: -0.8590671548371503] 60%|██████ | 12/20 [00:58<00:26, 3.33s/trial, best loss: -0.863204097133928] 65%|██████▌ | 13/20 [00:59<00:18, 2.63s/trial, best loss: -0.863204097133928] 70%|███████ | 14/20 [01:00<00:12, 2.14s/trial, best loss: -0.863204097133928] 75%|███████▌ | 15/20 [01:02<00:10, 2.10s/trial, best loss: -0.863204097133928] 80%|████████ | 16/20 [01:03<00:07, 1.77s/trial, best loss: -0.863204097133928] 85%|████████▌ | 17/20 [01:07<00:07, 2.44s/trial, best loss: -0.863204097133928] 90%|█████████ | 18/20 [01:09<00:04, 2.31s/trial, best loss: -0.863204097133928] 95%|█████████▌| 19/20 [01:11<00:02, 2.22s/trial, best loss: -0.863204097133928] 100%|██████████| 20/20 [01:12<00:00, 1.85s/trial, best loss: -0.863204097133928] 100%|██████████| 20/20 [01:12<00:00, 3.61s/trial, best loss: -0.863204097133928]
help(summary)
Help on AutoClassificationSummary in module databricks.automl.result object: class AutoClassificationSummary(builtins.object) | AutoClassificationSummary(experiment: mlflow.entities.experiment.Experiment, trials: List[databricks.automl.result.TrialInfo]) | | Summary of an AutoML run, including the MLflow experiment and list of detailed summaries for each trial. | | The MLflow experiment contains high level information, such as the root artifact location, experiment ID, | and experiment tags. The list of trials contains detailed summaries of each trial, such as the notebook and model | location, training parameters, and overall metrics. | | Example usage: | >>> summary.experiment.experiment_id | 32639121 | >>> len(summary.trials) | 10 | >>> print(summary.best_trial) | Model: DecisionTreeClassifier(ccp_alpha=0.0, class_weight=None, criterion='gini', | max_depth=3, max_features=None, max_leaf_nodes=None, | min_impurity_decrease=0.0, min_impurity_split=None, | ... | Model path: dbfs:/databricks/mlflow-tracking/32639121/7ff5e517fd524f30a77b777f5be46d24/artifacts/model | Preprocessors: [('onehot', OneHotEncoder(categories='auto', drop=None, dtype=<class 'numpy.float64'>, | handle_unknown='ignore', sparse=True), ['col2', 'col3'])] | Training duration: 0.056 minutes | Weighted F1 score: 0.901 | >>> best_model = summary.best_trial.load_model() | >>> best_model.predict(data) | array([1, 0, 1]) | | Methods defined here: | | __init__(self, experiment: mlflow.entities.experiment.Experiment, trials: List[databricks.automl.result.TrialInfo]) | :param experiment: MLflow experiment object for AutoML run | :param trials: List of TrialInfos for all trials, sorted descending by evaluation metric (best first) | | __str__(self) -> str | Returns a string with a detailed summary of the best trial as well as statistics about the entire experiment. | | Example usage: | >>> print(summary) | Overall summary: | Experiment ID: 32646004 | Number of trials: 10 | F1 distribution: min: 0.497, median: 0.612, max: 0.956 | Best trial: | Model: DecisionTreeClassifier | Model path: dbfs:/databricks/mlflow-tracking/32646004/3d6d726079a4439fb1bc687295f77da8/artifacts/model | Preprocessors: None | Training duration: 0.028 minutes | Weighted F1 score: 0.952 | | ---------------------------------------------------------------------- | Readonly properties defined here: | | best_trial | The trial corresponding to the best performing model of all completed trials. | | experiment | The MLflow experiment object. | | f1_distribution | The distribution of F1 scores across trials. | | trials | The list of detailed summaries for each trial. | | ---------------------------------------------------------------------- | Data descriptors defined here: | | __dict__ | dictionary for instance variables (if defined) | | __weakref__ | list of weak references to the object (if defined)

Next Steps

  • Explore the various notebooks and experiments linked above
  • If the metrics for the best trial notebook look good, skip directly to the inference section
  • If you want to improve on the model generated by the best trial:
    • Go to the notebook with the best trial and clone it.
    • Make necessary changes to the model to improve it. For example, you might try different hyperparameters.
    • Note the model URI where the artifact for the trained model is logged, and update the model_uri variable in cell 15.

Inference

The model trained by AutoML can be used to make predictions on new data. The examples below demonstrate how to make predictions on data in pandas DataFrames, or register the model as a Spark UDF for prediction on Spark DataFrames.

pandas DataFrame

model_uri = summary.best_trial.model_path
# model_uri = "<model-uri-from-generated-notebook>"
# Prepare test dataset
test_pdf = test_df.toPandas()
y_test = test_pdf['income']
X_test = test_pdf.drop('income', axis=1)
 
# Run inference using the best model
model = mlflow.pyfunc.load_model(model_uri)
predictions = model.predict(X_test)
test_pdf['income_predicted'] = predictions
display(test_pdf)
 
age
workclass
fnlwgt
education
education_num
marital_status
occupation
relationship
race
sex
capital_gain
capital_loss
hours_per_week
native_country
income
income_predicted
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
17
?
64785
10th
6
Never-married
?
Own-child
White
Male
0
0
30
United-States
<=50K
<=50K
17
?
256173
10th
6
Never-married
?
Own-child
White
Female
0
0
15
United-States
<=50K
<=50K
17
Private
166290
9th
5
Never-married
Other-service
Own-child
White
Female
0
0
20
United-States
<=50K
<=50K
17
Private
197850
11th
7
Never-married
Adm-clerical
Own-child
Asian-Pac-Islander
Female
0
0
24
United-States
<=50K
<=50K
17
Private
216137
11th
7
Never-married
Sales
Own-child
White
Female
0
0
8
United-States
<=50K
<=50K
17
Private
234780
HS-grad
9
Never-married
Farming-fishing
Own-child
Black
Male
0
0
40
United-States
<=50K
<=50K
18
Private
156874
12th
8
Never-married
Other-service
Own-child
White
Male
0
0
27
United-States
<=50K
<=50K
18
Private
178142
HS-grad
9
Never-married
Craft-repair
Own-child
White
Male
0
0
40
United-States
<=50K
<=50K
18
Private
231562
HS-grad
9
Never-married
Sales
Own-child
White
Female
0
0
33
United-States
<=50K
<=50K
18
Private
396270
HS-grad
9
Never-married
Adm-clerical
Own-child
White
Male
0
0
25
United-States
<=50K
<=50K
19
Private
47577
Some-college
10
Never-married
Transport-moving
Not-in-family
White
Male
0
0
50
United-States
<=50K
<=50K
19
Private
116562
HS-grad
9
Never-married
Other-service
Own-child
White
Female
0
0
40
United-States
<=50K
<=50K
19
Private
160033
Some-college
10
Never-married
Protective-serv
Own-child
White
Female
0
0
30
United-States
<=50K
<=50K
20
?
183083
Some-college
10
Never-married
?
Own-child
White
Female
0
0
20
United-States
<=50K
<=50K
20
Private
211049
Some-college
10
Never-married
Exec-managerial
Own-child
White
Female
0
0
30
United-States
<=50K
<=50K
20
Private
380544
Assoc-acdm
12
Never-married
Transport-moving
Own-child
White
Male
0
0
20
United-States
<=50K
<=50K
20
State-gov
215443
Some-college
10
Never-married
Craft-repair
Own-child
White
Male
0
0
38
United-States
<=50K
<=50K
21
?
152328
Some-college
10
Never-married
?
Own-child
White
Male
0
0
20
United-States
<=50K
<=50K

Showing all 340 rows.

Spark DataFrame

features = mlflow.pyfunc.load_model(model_uri).metadata.get_input_schema().column_names()
predict_udf = mlflow.pyfunc.spark_udf(spark, model_uri=model_uri, result_type="string")
display(test_df.withColumn("income_predicted", predict_udf(*features)))
 
age
workclass
fnlwgt
education
education_num
marital_status
occupation
relationship
race
sex
capital_gain
capital_loss
hours_per_week
native_country
income
income_predicted
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
17
?
64785
10th
6
Never-married
?
Own-child
White
Male
0
0
30
United-States
<=50K
<=50K
17
?
256173
10th
6
Never-married
?
Own-child
White
Female
0
0
15
United-States
<=50K
<=50K
17
Private
166290
9th
5
Never-married
Other-service
Own-child
White
Female
0
0
20
United-States
<=50K
<=50K
17
Private
197850
11th
7
Never-married
Adm-clerical
Own-child
Asian-Pac-Islander
Female
0
0
24
United-States
<=50K
<=50K
17
Private
216137
11th
7
Never-married
Sales
Own-child
White
Female
0
0
8
United-States
<=50K
<=50K
17
Private
234780
HS-grad
9
Never-married
Farming-fishing
Own-child
Black
Male
0
0
40
United-States
<=50K
<=50K
18
Private
156874
12th
8
Never-married
Other-service
Own-child
White
Male
0
0
27
United-States
<=50K
<=50K
18
Private
178142
HS-grad
9
Never-married
Craft-repair
Own-child
White
Male
0
0
40
United-States
<=50K
<=50K
18
Private
231562
HS-grad
9
Never-married
Sales
Own-child
White
Female
0
0
33
United-States
<=50K
<=50K
18
Private
396270
HS-grad
9
Never-married
Adm-clerical
Own-child
White
Male
0
0
25
United-States
<=50K
<=50K
19
Private
47577
Some-college
10
Never-married
Transport-moving
Not-in-family
White
Male
0
0
50
United-States
<=50K
<=50K
19
Private
116562
HS-grad
9
Never-married
Other-service
Own-child
White
Female
0
0
40
United-States
<=50K
<=50K
19
Private
160033
Some-college
10
Never-married
Protective-serv
Own-child
White
Female
0
0
30
United-States
<=50K
<=50K
20
?
183083
Some-college
10
Never-married
?
Own-child
White
Female
0
0
20
United-States
<=50K
<=50K
20
Private
211049
Some-college
10
Never-married
Exec-managerial
Own-child
White
Female
0
0
30
United-States
<=50K
<=50K
20
Private
380544
Assoc-acdm
12
Never-married
Transport-moving
Own-child
White
Male
0
0
20
United-States
<=50K
<=50K
20
State-gov
215443
Some-college
10
Never-married
Craft-repair
Own-child
White
Male
0
0
38
United-States
<=50K
<=50K
21
?
152328
Some-college
10
Never-married
?
Own-child
White
Male
0
0
20
United-States
<=50K
<=50K

Showing all 340 rows.

Test

Finally, running the prediction on the holdout test set shows how well the model would work in a production setting. The diagram below shows the breakdown between accurate and inaccurate predictions.

model = mlflow.sklearn.load_model(model_uri)
sklearn.metrics.plot_confusion_matrix(model, X_test, y_test)
Out[9]:
<sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay at 0x7ffafa1ded60>