April 2019

These features and Databricks platform improvements were released in April 2019.

Note

Releases are staged. Your Databricks account may not be updated until up to a week after the initial release date.

MLflow on Databricks (GA)

April 25, 2019

Managed MLflow on Databricks is now generally available. MLflow on Databricks offers a hosted version of MLflow fully integrated with the Databricks security model and interactive workspace. See ML lifecycle management using MLflow.

Delta Lake on Databricks

April 24, 2019

Databricks has open sourced the Delta Lake project. Delta Lake is a storage layer that brings reliability to data lakes built on HDFS and cloud storage by providing ACID transactions through optimistic concurrency control between writes and snapshot isolation for consistent reads during writes. Delta Lake also provides built-in data versioning for easy rollbacks and reproducing reports.

Note

What was previously called Databricks Delta is now the Delta Lake open source project plus optimizations available on Databricks. See What is Delta Lake?.

MLflow runs sidebar

April 9 - 16, 2019: Version 2.95

You can now view the MLflow runs and the notebook revisions that produced these runs in a sidebar next to your notebook. In the notebook’s right sidebar, click the Experiment icon Experiment icon.

See Create notebook experiment.

C5d series Amazon EC2 instance types (Beta)

April 9 - 16, 2019: Version 2.95

Databricks now provides Beta support for the Amazon EC2 C5d series.

Databricks Runtime 5.3 (GA)

April 3, 2019

Databricks Runtime 5.3 is now generally available. Databricks Runtime 5.3 includes new Delta Lake features and upgrades, and upgraded Python, R, Java, and Scala libraries.

Major upgrades include:

  • Databricks Delta time travel GA

  • MySQL table replication to Delta, Public Preview

  • Notebook-scoped library improvements

  • New Databricks Advisor hints

For details, see Databricks Runtime 5.3 (unsupported).

Databricks Runtime 5.3 ML (GA)

April 3, 2019

With Databricks Runtime 5.3 for Machine Learning, we have achieved our first GA of Databricks Runtime ML! Databricks Runtime ML provides a ready-to-go environment for machine learning and data science. It builds on Databricks Runtime and adds many popular machine learning libraries, including TensorFlow, PyTorch, Keras, and XGBoost. It also supports distributed training using Horovod.

This version is built on Databricks Runtime 5.3, with additional libraries, some different library versions, and Conda package management for Python libraries. Major new features since Databricks Runtime 5.2 ML Beta include:

  • MLlib integration with MLflow (Private Preview), which provides automatic logging of MLflow runs for models fit using the PySpark tuning algorithms CrossValidator and TrainValidationSplit.

    If you want to participate in the preview, contact your Databricks account team.

  • Upgrades to the PyArrow, Horovod, and TensorboardX libraries.

    The PyArrow update adds the ability to use BinaryType when you perform Arrow-based conversion and makes it available in pandas UDF.

For more information, see Databricks Runtime 5.3 ML (unsupported). For instructions on creating a Databricks Runtime ML cluster, see AI and Machine Learning on Databricks.