These features and Databricks platform improvements were released in April 2019.
Releases are staged. Your Databricks account may not be updated until up to a week after the initial release date.
April 25, 2019
Managed MLflow on Databricks is now generally available. MLflow on Databricks offers a hosted version of MLflow fully integrated with the Databricks security model and interactive workspace. See MLflow.
April 24, 2019
Databricks has open sourced the Delta Lake project. Delta Lake is a storage layer that brings reliability to data lakes built on HDFS and cloud storage by providing ACID transactions through optimistic concurrency control between writes and snapshot isolation for consistent reads during writes. Delta Lake also provides built-in data versioning for easy rollbacks and reproducing reports.
What was previously called Databricks Delta is now the Delta Lake open source project plus optimizations available on Databricks. See Delta Lake.
April 9 - 16, 2019: Version 2.95
Databricks now provides Beta support for the Amazon EC2 C5d series.
April 3, 2019
Databricks Runtime 5.3 is now generally available. Databricks Runtime 5.3 includes new Delta Lake features and upgrades, and upgraded Python, R, Java, and Scala libraries.
Major upgrades include:
- Databricks Delta time travel GA
- MySQL table replication to Delta, Public Preview
- Notebook-scoped library improvements
- New Databricks Advisor hints
For details, see Databricks Runtime 5.3 (Unsupported).
April 3, 2019
With Databricks Runtime 5.3 for Machine Learning, we have achieved our first GA of Databricks Runtime ML! Databricks Runtime ML provides a ready-to-go environment for machine learning and data science. It builds on Databricks Runtime and adds many popular machine learning libraries, including TensorFlow, PyTorch, Keras, and XGBoost. It also supports distributed training using Horovod.
This version is built on Databricks Runtime 5.3, with additional libraries, some different library versions, and Conda package management for Python libraries. Major new features since Databricks Runtime 5.2 ML Beta include:
MLlib integration with MLflow (Private Preview), which provides automatic logging of MLflow runs for models fit using the PySpark tuning algorithms
If you want to participate in the preview, contact your Databricks account representative.
Upgrades to the PyArrow, Horovod, and TensorboardX libraries.
The PyArrow update adds the ability to use
BinaryTypewhen you perform Arrow-based conversion and makes it available in pandas UDF.