May 2018

Releases are staged. Your Databricks account may not be updated until a week after the initial release date.

General Data Protection Regulation (GDPR)

May 22-31, 2018: Version 2.72

To meet the requirements of the European Union General Data Protection Regulation (GDPR), which goes into effect on May 25, 2018, we have made a number of modifications to the Databricks platform to provide you with more control of data retention at both the account and user level. Updates include:

  • Cluster delete: permanently delete a cluster configuration using the UI or the Clusters API. See Delete a compute.

  • Workspace purge (released in version 2.71): permanently delete workspace objects, such as entire notebooks, individual notebook cells, individual notebook comments, and notebook revision history. See Purge workspace storage.

  • Notebook revision history purge:

    • Permanently delete the revision history of all notebooks in a workspace for a defined time frame. See Purge workspace storage.

    • Permanently delete a single notebook revision or the entire revision history of a notebook. See Version history.

Account management features allow you to cancel subscriptions and delete your account:

  • Cancel your Databricks subscription. By default, permanent data purge happens 30 days after you cancel a workspace subscription.

  • Cancel any Community Edition subscription associated with your account owner username, separately from canceling your free trial or paid standard Databricks subscription.

  • Delete your Databricks account, including your login credentials and billing information.

For details, see Manage your subscription.

HorovodEstimator

May 29, 2018: Version 2.72

Added documentation and a notebook for HorovodEstimator, an MLlib-style estimator API that leverages Uber’s Horovod framework. HorovodEstimator facilitates distributed, multi-GPU training of deep neural networks on Spark DataFrames, simplifying the integration of ETL in Spark with model training in TensorFlow.

MLeap ML Model Export

May 22-31, 2018: Version 2.72

Added documentation and notebooks on using MLeap on Databricks. MLeap allows you to deploy machine learning pipelines from Apache Spark and scikit-learn to a portable format and execution engine. See MLeap ML model export.

Notebook cells: hide and show

May 22-31, 2018: Version 2.72

New indicators and messaging make it easier to show Notebook cell contents after they’ve been hidden. See Hide and show cell content.

Databricks Runtime 4.1 for Machine Learning (Beta)

May 18, 2018

Databricks Runtime ML (Beta) provides a ready-to-go environment for machine learning and data science. It contains multiple popular libraries, including TensorFlow, Keras, and XGBoost. It also supports distributed TensorFlow training using Horovod.

Databricks Runtime ML lets you start a Databricks cluster with all of the libraries required for distributed TensorFlow training. It ensures the compatibility of the libraries included on the cluster (between TensorFlow and CUDA / cuDNN, for example) and substantially decreases the cluster start-up time compared to using init scripts.

Note

Upon GA, Databricks Runtime 4.1 ML will require a Databricks plan that includes the Databricks Operational Security package. Databricks Runtime 4.1 ML is currently available on standard plans without the Operational Security package. It is not available on Community Edition accounts. Note also that the Operational Security requirement may be enforced at any point during the beta period, before GA. If so, we will communicate the change in advance.

See the complete release notes for Databricks Runtime 4.1 ML (EoS).

New GPU cluster types

May 10 - May 17, 2018: Version 2.71

We’re pleased to announce support for AWS EC2 P3 instance types on Databricks clusters. P3 instances provide industry-leading GPUs to power image processing, text analysis, and other machine learning and deep learning tasks that are computationally challenging and demand superior performance. As part of this new instance type rollout, we have significantly decreased the cost of clusters running on P2 instances.

Databricks also provides pre-installed NVIDIA drivers and libraries configured for GPUs, along with material for getting started with several popular deep learning libraries.

See also:

Secret management

May 10 - May 30, 2018: Version 2.71

Databricks now provides powerful tools for managing the credentials you need for authenticating to external data sources. Instead of typing your credentials directly into a notebook, use Databricks secret management to store and reference your credentials in notebooks and jobs. To manage secrets, you can use the Databricks CLI (legacy) to access the Secrets API.

Note

Secret management requires Databricks Runtime 4.0 or above and Databricks CLI 0.7.1 or above. This feature is being rolled out gradually to all Databricks accounts over the course of May.

See Secret management.

Cluster pinning

May 10 - May 17, 2018: Version 2.71

You can now pin a cluster to the Clusters list. This lets you retain the configuration of clusters terminated over 30 days old.

Pin cluster

In addition, the Clusters page now displays all clusters that were terminated within 30 days (increased from 7 days).

See Pin a compute.

Cluster autostart

May 10 - May 17, 2018: Version 2.71

Before this release, jobs scheduled to run on Terminated clusters failed. For clusters created in Databricks version 2.71 and above, commands from a JDBC/ODBC interface or a job run assigned to an existing terminated cluster automatically restarts that cluster. See JDBC connect and Configure and edit Databricks Jobs.

Autostart allows you to configure clusters to autoterminate, without requiring manual intervention to restart the clusters for scheduled jobs. Furthermore, you can schedule cluster initialization by scheduling a job that restarts terminated clusters at a specified time.

Cluster access control is enforced and job owner permissions are checked as usual.

Workspace purging

May 10 - May 17, 2018: Version 2.71

As part of our ongoing effort to comply with the European Union General Data Protection Regulation (GDPR), we have added the ability to purge workspace objects, such as entire notebooks, individual notebook cells, individual notebook comments, and notebook revision history. We will release more functionality and documentation to support GDPR compliance in the coming weeks.

See :Purge workspace storage.

Databricks CLI 0.7.1

May 10, 2018

Databricks CLI 0.7.1 includes updates to Secrets commands.

See Databricks CLI (legacy) and Secret management.

Display() support for image data types

May 8, 2018

In Databricks Runtime 4.1, display() now renders columns containing image data types as rich HTML.

See Images.

Databricks Delta update

May 8, 2018

Databricks Runtime 4.1 includes a major upgrade to Databricks Delta. Databricks highly recommends that all Delta customers upgrade to the new runtime. This release remains in Private Preview, but it represents a candidate release in anticipation of the upcoming GA release.

For more information, see Databricks Runtime 4.1 (EoS) and What is Delta Lake?.

S3 Select connector

May 8, 2018

Databricks Runtime 4.1 includes a new Amazon S3 Select data source connector. See S3 Select and Databricks Runtime 4.1 (EoS).