April 2019
These features and Databricks platform improvements were released in April 2019.
Note
Releases are staged. Your Databricks account may not be updated until up to a week after the initial release date.
MLflow on Databricks (GA)
April 25, 2019
Managed MLflow on Databricks is now generally available. MLflow on Databricks offers a hosted version of MLflow fully integrated with the Databricks security model and interactive workspace. See ML lifecycle management using MLflow.
Delta Lake on Databricks
April 24, 2019
Databricks has open sourced the Delta Lake project. Delta Lake is a storage layer that brings reliability to data lakes built on HDFS and cloud storage by providing ACID transactions through optimistic concurrency control between writes and snapshot isolation for consistent reads during writes. Delta Lake also provides built-in data versioning for easy rollbacks and reproducing reports.
Note
What was previously called Databricks Delta is now the Delta Lake open source project plus optimizations available on Databricks. See What is Delta Lake?.
MLflow runs sidebar
April 9 - 16, 2019: Version 2.95
You can now view the MLflow runs and the notebook revisions that produced these runs in a sidebar next to your notebook. In the notebook’s right sidebar, click the Experiment icon .
C5d series Amazon EC2 instance types (Beta)
April 9 - 16, 2019: Version 2.95
Databricks now provides Beta support for the Amazon EC2 C5d series.
Databricks Runtime 5.3 (GA)
April 3, 2019
Databricks Runtime 5.3 is now generally available. Databricks Runtime 5.3 includes new Delta Lake features and upgrades, and upgraded Python, R, Java, and Scala libraries.
Major upgrades include:
Databricks Delta time travel GA
MySQL table replication to Delta, Public Preview
Notebook-scoped library improvements
New Databricks Advisor hints
For details, see Databricks Runtime 5.3 (EoS).
Databricks Runtime 5.3 ML (GA)
April 3, 2019
With Databricks Runtime 5.3 for Machine Learning, we have achieved our first GA of Databricks Runtime ML! Databricks Runtime ML provides a ready-to-go environment for machine learning and data science. It builds on Databricks Runtime and adds many popular machine learning libraries, including TensorFlow, PyTorch, Keras, and XGBoost. It also supports distributed training using Horovod.
This version is built on Databricks Runtime 5.3, with additional libraries, some different library versions, and Conda package management for Python libraries. Major new features since Databricks Runtime 5.2 ML Beta include:
MLlib integration with MLflow (Private Preview), which provides automatic logging of MLflow runs for models fit using the PySpark tuning algorithms
CrossValidator
andTrainValidationSplit
.If you want to participate in the preview, contact your Databricks account team.
Upgrades to the PyArrow, Horovod, and TensorboardX libraries.
The PyArrow update adds the ability to use
BinaryType
when you perform Arrow-based conversion and makes it available in pandas UDF.
For more information, see Databricks Runtime 5.3 ML (EoS). For instructions on creating a Databricks Runtime ML cluster, see AI and Machine Learning on Databricks.