June 2020

These features and Databricks platform improvements were released in June 2020.

Note

Releases are staged. Your Databricks account may not be updated until up to a week after the initial release date.

Billable usage logs delivered to your own S3 bucket (Public Preview)

June 30, 2020

Databricks account owners can now configure daily delivery of billable usage logs in CSV file format to an AWS S3 storage bucket, where you can make the data available for usage analysis. Databricks delivers a separate monthly CSV file for each workspace in your account. This CSV file includes detailed data about the workspace’s cluster usage in Databricks Units (DBUs) by cluster ID, billing SKU, cluster creator, cluster tags, and more. For a description of each CSV file column, see Download usage as a CSV file.

This file has been available for download from the Usage Overview tab in the Databricks account console, where it is accessible only by Databricks account owners. Delivery to your S3 buckets lets you give access to the users who need it and provide the data programmatically to your analysis tools, so you can view usage trends, perform chargebacks, and identify cost optimization opportunities.

For more information, see Deliver and access billable usage logs.

Databricks Connect now supports Databricks Runtime 6.6

June 26, 2020

Databricks Connect now supports Databricks Runtime 6.6.

Databricks Runtime 7.0 ML GA

Jun 22, 2020

Databricks Runtime 7.0 ML is built on top of Databricks Runtime 7.0 and includes the following new features:

Notebook-scoped Python libraries and custom environments managed by conda and pip commands.
Updates for major Python packages including tensorflow, tensorboard, pytorch, xgboost, sparkdl, and hyperopt.
Newly added Python packages lightgbm, nltk, petastorm, and plotly.
RStudio Server Open Source v1.2.

For more information, see the complete Databricks Runtime 7.0 ML (EoS) release notes.

Databricks Runtime 7.0 GA, powered by Apache Spark 3.0

June 18, 2020

Databricks Runtime 7.0 is powered by Apache Spark 3.0 and now supports Scala 2.12.

Spark 3.0 brings many additional features and improvements, including:

Adaptive Query Execution, a flexible framework to do adaptive execution in Spark SQL and support changing the number of reducers at runtime.
Redesigned pandas UDFs with type hints.
Structured Streaming web UI.
Better compatibility with ANSI SQL standards.
Join hints.

Databricks Runtime 7.0 adds:

Improved Auto Loader for processing new data files incrementally as they arrive on a cloud blob store during ETL.
Improved COPY INTO command for loading data into Delta Lake with idempotent retries.
Many improvements, library additions and upgrades, and bug fixes.

For more information, see the complete Databricks Runtime 7.0 (EoS) release notes.

Databricks Runtime 7.0 for Genomics GA

June 18, 2020

Databricks Runtime 7.0 for Genomics is built on top of Databricks Runtime 7.0 and includes the following library changes:

The ADAM library has been updated from version 0.30.0 to 0.32.0.
The Hail library is not included in Databricks Runtime 7.0 for Genomics, because there is no release based on Apache Spark 3.0.

Stage-dependent access controls for MLflow models

June 16-23, 2020: Version 3.22

You can now assign stage-dependent access controls to users or groups, allowing them to manage MLflow Models registered in the MLflow Model Registry at the Staging or Production stage. We introduced two new permission levels, CAN MANAGE STAGING VERSIONS and CAN MANAGE PRODUCTION VERSIONS. Users with these permissions can perform transitions between stages allowed for the level.

For details, see MLflow model ACLs.

Notebooks now support disabling auto-scroll

June 16-23, 2020: Version 3.22

When you run a notebook cell using shift+enter, the default notebook behavior is to auto-scroll to the next cell if the cell is not visible. You can now disable auto-scroll in > User Settings > Editor settings. If you disable auto-scroll, on shift+enter the focus moves to the next cell, but the notebook does not scroll to that cell.

Skipping instance profile validation now available in the UI

June 16-23, 2020: Version 3.22

The Add Instance Profile dialog now has a checkbox that allows you to skip validation. If validation fails, you can select this checkbox to skip the validation and forcibly add the instance profile.

Account ID is displayed in account console

June 16-23, 2020, Version 3.22

Your Databricks account ID is now displayed on the Usage Overview tab in the account console.

Internet Explorer 11 support ends on August 15

June 9, 2020

In keeping with industry trends and to ensure a stable and consistent user experience for our customers, Databricks will end support for Internet Explorer 11 on August 15, 2020.

Databricks Runtime 6.2 series support ends

June 3, 2020

Support for Databricks Runtime 6.2, Databricks Runtime 6.2 for Machine Learning, and Databricks Runtime 6.2 for Genomics ended on June 3. See Databricks support lifecycles.

Simplify and control cluster creation using cluster policies (Public Preview)

June 2-9, 2020: Version 3.21

Note

Databricks is rolling out this public preview over two releases. It may not be deployed to your workspace until the next release. Contact your Databricks account team with any questions.

Cluster policies are admin-defined, reusable cluster templates that enforce rules on cluster attributes and thus ensure that users create clusters that conform to those rules. As a Databricks admin, you can now create cluster policies and give users policy permissions. By doing that, you have more control over the resources created, give users the level of flexibility they need to do their work, and considerably simplify the cluster creation experience.

For details, see Create and manage compute policies.

SCIM Me endpoint now returns SCIM compliant response

June 2-9, 2020: Version 3.21

The SCIM Me endpoint now returns the same information as the /users/{id} endpoint, including information such as groups, entitlements, and roles.

See CurrentUser API.

G4 family of GPU-accelerated EC2 instances now available for machine learning application deployments (Beta)

June 2-9, 2020: Version 3.21

G4 instances are optimized for deploying machine learning models in production. To use TensorRT on these instance types with the current releases of Databricks Runtime for Machine Learning (as of June 2, 2020), you must manually install libnvinfer using an init script. We anticipate that future GPU-enabled versions of Databricks Runtime ML will contain this package.

Deploy multiple workspaces in your Databricks account (Public Preview)

June 1, 2020

The new Multi-workspace API (renamed Account API on September 1, 2020) introduces an administrative and management layer (the account layer) on top of Databricks workspaces that gives an account owner a single pane of glass to create, configure, and manage multiple workspaces for your organization. Use the API to create one or more workspaces for each team in your organization that needs to use Databricks, or one workspace for each of your dev, staging, and production environments. Databricks provisions a ready-to-use workspace within minutes. Workspaces are completely isolated from each other. You can choose to deploy a workspace in the same underlying AWS account or in different AWS accounts, depending on your operational plan. The Multi-workspace API (Account API) is available on the accounts.cloud.databricks.com endpoint.

For more information, see Create a workspace using the Account API.

Contact your Databricks account team to request access to this public preview.

Deploy Databricks workspaces in your own VPC (Public Preview)

June 1, 2020

By default, clusters are created in a single AWS VPC (Virtual Private Cloud) that Databricks creates and configures in your AWS account. Now you have the option to create your Databricks workspaces in your own VPC, a feature known as customer-managed VPC, which can allow you to exercise more control over the infrastructure and help you comply with specific cloud security and governance standards your organization may require. You simply provide your VPC ID, security group ID and subnet IDs when you create your workspace using the Multi-workspace API (Account API).

For more information, see Configure a customer-managed VPC.

This feature is available only on E2 version of the Databricks platform, not on the existing enterprise platform.

Secure cluster connectivity with no open ports on your VPCs and no public IP addresses on Databricks workers (Public Preview)

June 1, 2020

With the release of the E2 version of the Databricks platform, Databricks is providing a new network architecture for connectivity between the Databricks control plane (delivered as SaaS) and the compute plane (your own AWS VPC). With this new architecture, you no longer have to open inbound ports on cluster VMs: cluster VMs launched in your customer-managed-VPC now initiate an outbound TLS 1.2 connection to the Databricks control plane. Not only is this architecture compliant with common InfoSec requirements, but it eliminates the need for VPC peering and gives you more flexibility in how you connect your environment to the Databricks control plane.

For more information, see What is secure cluster connectivity?.

Contact your Databricks account team to request access to this public preview.

Restrict access to Databricks using IP access lists (Public Preview)

June 1, 2020

Databricks workspaces can now be configured so that users connect to the service only through existing corporate networks with a secure perimeter. Databricks admins can use the IP Access List API to define a set of approved IP addresses, including allow and block lists. All incoming access to the web application and REST APIs requires that the user connect from an authorized IP address, guaranteeing that workspaces cannot be accessed from a public network like a coffee shop or an airport unless your users use VPN.

This feature is not available on all Databricks subscriptions. Contact your Databricks account team with any questions about access for your account.

For more information, see Configure IP access lists for workspaces.

Encrypt locally attached disks (Public Preview)

June 1, 2020

Some instance types you use to run clusters may have locally attached disks. Databricks may store shuffle data or ephemeral data on these locally attached disks. To ensure that all data at rest is encrypted for all storage types, including shuffle data that is stored temporarily on your cluster’s local disks, you can now enable local disk encryption using the Clusters API. See Local disk encryption.