May 2021

These features and Databricks platform improvements were released in May 2021.

Note

Releases are staged. Your Databricks account may not be updated until a week or more after the initial release date.

Databricks Mosaic AI: a data-native and collaborative solution for the full ML lifecycle

May 27, 2021

The new Machine Learning persona, selectable in the sidebar of the Databricks UI, gives you easy access to a new purpose-built environment for ML, including the model registry and four new features in Public Preview:

  • A new dashboard page with convenient resources, recents, and getting started links.

  • A new Experiments page that centralizes experiment discovery and management.

  • Mosaic AutoML, a way to automatically generate ML models from data and accelerate the path to production.

  • Feature Store, a way to catalog ML features and make them available for training and serving, increasing reuse. With a data-lineage–based feature search that leverages automatically-logged data sources, you can make features available for training and serving with simplified model deployment that doesn’t require changes to the client application.

For details, see AI and machine learning on Databricks.

SQL Analytics is renamed to Databricks SQL

May 27, 2021

SQL Analytics is renamed to Databricks SQL. For more details, see the Databricks SQL release note.

Create and manage ETL pipelines using Delta Live Tables (Public Preview)

May 26, 2021

Databricks is pleased to introduce Delta Live Tables, a cloud service that makes extract, transform, and load (ETL) development simple, reliable, and scalable. Delta Live Tables:

  • Provides an intuitive and familiar declarative interface to build pipelines.

  • Enables you to monitor data processing pipelines, visualize dependencies, and manage pipelines and dependencies across different environments.

  • Enables test-driven development, enforcement of data quality constraints, and application of uniform data error handling policies

  • Automates deployment of your data processing pipelines so you can easily upgrade, rollback, and incrementally reprocesses data.

See What is Delta Live Tables? for details.

Reduced scope of required egress rules for customer-managed VPCs

May 26, 2021

For E2 workspaces using customer-managed VPCs, you can now configure a more controlled set of egress (outbound) rules for your workspace’s security group and the subnet-level network ACL.

  • Previously, Databricks required access to 0.0.0.0/0 for all ports and protocols.

  • Now, you only need to grant TCP access for ports 443 and 3306 in all cases, and additionally port 6666 if you use PrivateLink.

No immediate change is required; the more broad egress rules continue to work.

Encrypt Databricks SQL queries and query history using your own key (Public Preview)

May 20, 2021

For details, see the Databricks SQL release notes.

Increased limit for the number of terminated all-purpose clusters

May 18, 2021: Version 3.46

You can now have up to 150 terminated all-purpose clusters in a Databricks workspace. Previously the limit was 120. For details, see Terminate a compute. The limit on the number of terminated all-purpose clusters returned by the Clusters API request is also now 150.

Increased limit for the number of pinned clusters

May 18, 2021: Version 3.46

You can now have up to 70 pinned clusters in a Databricks workspace. Previously the limit was 50. For details, see Pin a compute

Manage where notebook results are stored (Public Preview)

May 18, 2021: Version 3.46

You can now choose to store all notebook results in your root S3 storage bucket regardless of size or run type. By default, some results for interactive notebooks are stored in Databricks. A new configuration enables you to store these in root S3 storage bucket in your own account. For details, see Configure notebook result storage location.

This feature has no impact on notebooks run as jobs, whose results are always stored in root S3 storage bucket.

The new improved account console is GA

May 17, 2021

The new account console is now GA. It is available on the E2 version of the Databricks platform, which is now enabled for almost all Databricks accounts.

The new account console gives you the ability to manage all of your workspaces in one place:

  • Create and manage the lifecycle of multiple workspaces.

  • View your organization’s spend on Databricks across all workspaces.

  • Delegate account administration and set up single sign-on for account admins

For details, see Manage your Databricks account.

Customer-managed keys for workspace storage (Public Preview)

May 10, 2021

Customer-managed keys for workspace storage is now in Public Preview in all Databricks regions that support the E2 architecture, with the exception of region us-west-1. A workspace storage key encrypts two types of data for your workspace. It encrypts your workspace’s S3 bucket, which contains the DBFS root and workspace system data. Optionally, the same key can be used to encrypt your workspace’s cluster node EBS volumes. You can add a customer-managed workspace storage key to a new or existing E2 workspace.

Changes to the Account API for customer-managed keys

May 10, 2021

The Account API includes the following updates to support the Public Preview of customer-managed keys for workspace storage and the renaming of “customer-managed keys for notebooks” to “customer-managed keys for managed services”:

  • For key configurations, there is a new required property use_cases. It indicates the purpose of the key.

  • For workspace configurations, the customer_managed_key_id was renamed managed_services_customer_managed_key_id. API calls made after today will use the new name for requests and responses.

Google Cloud Storage connector (GA)

May 4, 2021: Version 3.45

You can read from and write to Google Cloud Storage (GCS) buckets in Databricks with gs: URLs. See Connect to Google Cloud Storage.

Databricks Runtime 7.4 series support ends

May 3, 2021

Support for Databricks Runtime 7.4, Databricks Runtime 7.4 for Machine Learning, and Databricks Runtime 7.4 for Genomics ended on May 3. See Databricks support lifecycles.

Better governance with enhanced audit logging

May 3-10, 2021: Version 3.45

Audit logs now capture:

  • Downloading of results from a notebook cell (actions: downloadPreviewResults or downloadLargeResults).

  • Exporting a notebook (action: workspaceExport).

  • Changing the admin console setting that allows or prevents notebook cell downloads (action: workspaceConfEdit, parameter: workspaceConfKeys = enableResultsDownloading).

  • Changing the admin console setting that allows or prevents notebook export (action: workspaceConfEdit, parameter: workspaceConfKeys = enableExportNotebook).

See the events in the Workspace-level audit log events table.

Use SSO to authenticate to the account console (Public Preview)

May 3-10, 2021: Version 3.45

Account administrators can now authenticate to the account console using single sign-on (SSO) backed by your organization’s identity provider. The account must be on the E2 version of the Databricks platform. Most Databricks accounts are now on E2; if you are unsure, consult your Databricks account team.

For details, see Configure SSO in Databricks.

Repos users can now integrate with Azure DevOps using personal access tokens

May 3-10, 2021: Version 3.45

Repos now supports Azure DevOps. For details, see Set up Databricks Git folders (Repos).

Databricks administrators can use the Service Principals API to create API-only service principals and grant them access to Databricks resources, just like normal users. Administrators can create personal access tokens on behalf of service principals. The workspace must be on the E2 version of the Databricks platform. For details, see Manage service principals. (Public preview)

Jobs service stability and scalability improvements (Public Preview)

May 3-10, 2021: Version 3.45

To increase the stability and scalability of the Jobs service, each new job and run will now be assigned a unique, non-sequential identifier that may not monotonically increase. Clients that use the Jobs API and depend on sequential or monotonically increasing identifiers must be modified to accept non-sequential and unordered IDs.

Service principals provide API-only access to Databricks resources (Public Preview)

May 3-10, 2021: Version 3.45

Databricks administrators can use the Service Principals API to add API-only service principals to Databricks and grant them access to resources, just like normal users. Administrators can create personal access tokens on behalf of service principals. For details, see Manage service principals.