February 2019
These features and Databricks platform improvements were released in February 2019.
Note
Releases are staged. Your Databricks account may not be updated until up to a week after the initial release date.
Managed MLflow on Databricks Public Preview
February 26 - March 5, 2019: Version 2.92
MLflow is an open source platform for managing the end-to-end machine learning lifecycle. It tackles three primary functions:
Tracking experiments to record and compare parameters and results.
Managing and deploying models from a variety of ML libraries to a variety of model serving and inference platforms.
Packaging ML code in a reusable, reproducible form to share with other data scientists or transfer to production.
Databricks now provides a fully managed and hosted version of MLflow integrated with enterprise security features, high availability, and other Databricks workspace features such as experiment management, run management, and notebook revision capture. MLflow on Databricks offers an integrated experience for tracking and securing machine learning model training runs and running machine learning projects. By using managed MLflow on Databricks, you get the advantages of both platforms, including:
Workspaces: Collaboratively track and organize experiments and results within Databricks Workspaces with a hosted MLflow Tracking Server and integrated experiment UI. When you use MLflow in notebooks, Databricks automatically captures notebook revisions so you can reproduce the same code and runs later.
Security: Take advantage of one common security model for the entire ML lifecycle via ACLs.
Jobs: Run MLflow projects as Databricks jobs remotely and directly from Databricks notebooks.
Here’s a demo of a tracking workflow in a Databricks Workspace:
For details, see Track ML and deep learning training runs and Run MLflow Projects on Databricks.
Azure Data Lake Storage Gen2 connector is generally available
February 15, 2019
Azure Data Lake Storage Gen2 (ADLS Gen2), the next-generation data lake solution for big data analytics, is now GA, as is the ADLS Gen2 connector for Databricks. We are also pleased to announce that ADLS Gen2 supports Databricks Delta when you are running clusters on Databricks Runtime 5.2 and above.
Python 3 now the default when you create clusters
February 12-19, 2019: Version 2.91
The default Python version for clusters created using the UI has switched from Python 2 to Python 3. The default for clusters created using the REST API is still Python 2.
Existing clusters will not change their Python versions. But if you’ve been in the habit of taking the Python 2 default when you create new clusters, you’ll need to start paying attention to your Python version selection.
Additional cluster instance types
February 12-19, 2019: Version 2.91
Databricks now provides Beta support for the following Amazon EC2 instance types:
c5.18xlarge
r5.24xlarge
r4.16xlarge
m5.24xlarge
Delta Lake generally available
February 1, 2019
Now everyone can get the benefits of Databricks Delta’s powerful transactional storage layer and super-fast reads: as of February 1, Delta Lake is GA and available on all supported versions of Databricks Runtime. For information about Delta, see the What is Delta Lake?.