February 2019

These features and Databricks platform improvements were released in February 2019.

Note

Releases are staged. Your Databricks account may not be updated until up to a week after the initial release date.

Managed MLflow on Databricks Public Preview

February 26 - March 5, 2019: Version 2.92

MLflow is an open source platform for managing the end-to-end machine learning lifecycle. It tackles three primary functions:

  • Tracking experiments to record and compare parameters and results.
  • Managing and deploying models from a variety of ML libraries to a variety of model serving and inference platforms.
  • Packaging ML code in a reusable, reproducible form to share with other data scientists or transfer to production.

Databricks now provides a fully managed and hosted version of MLflow integrated with enterprise security features, high availability, and other Databricks workspace features such as experiment management, run management, and notebook revision capture. MLflow on Databricks offers an integrated experience for tracking and securing machine learning model training runs and running machine learning projects. By using managed MLflow on Databricks, you get the advantages of both platforms, including:

  • Workspaces: Collaboratively track and organize experiments and results within Databricks Workspaces with a hosted MLflow Tracking Server and integrated experiment UI. When you use MLflow in notebooks, Databricks automatically captures notebook revisions so you can reproduce the same code and runs later.
  • Security: Take advantage of one common security model for the entire ML lifecycle via ACLs.
  • Jobs: Run MLflow projects as Databricks jobs remotely and directly from Databricks notebooks.

Here’s a demo of a tracking workflow in a Databricks Workspace:

../../../_images/mlflow-exp.gif

For details, see Experiments and Reproducible Runs with MLflow Projects.

Azure Data Lake Storage Gen2 connector is generally available

February 15, 2019

Azure Data Lake Storage Gen2 (ADLS Gen2), the next-generation data lake solution for big data analytics, is now GA, as is the ADLS Gen2 connector for Databricks. We are also pleased to announce that ADLS Gen2 supports Databricks Delta when you are running clusters on Databricks Runtime 5.2 and above.

Python 3 now the default when you create clusters

February 12-19, 2019: Version 2.91

The default Python version for clusters created using the UI has switched from Python 2 to Python 3. The default for clusters created using the REST API is still Python 2.

Existing clusters will not change their Python versions. But if you’ve been in the habit of taking the Python 2 default when you create new clusters, you’ll need to start paying attention to your Python version selection.

../../../_images/python-version-3.png

See Python Clusters.

Additional cluster instance types

February 12-19, 2019: Version 2.91

Databricks now provides Beta support for the following Amazon EC2 instance types:

  • c5.18xlarge
  • r5.24xlarge
  • r4.16xlarge
  • m5.24xlarge

Delta Lake generally available

February 1, 2019

Now everyone can get the benefits of Databricks Delta’s powerful transactional storage layer and super-fast reads: as of February 1, Delta Lake is GA and available on all supported versions of Databricks Runtime. For information about Delta, see the Delta Lake Guide.