September 2020

These features and Databricks platform improvements were released in September 2020.

Note

Releases are staged. Your Databricks account may not be updated until up to a week after the initial release date.

Databricks Runtime 7.3, 7.3 ML, and 7.3 Genomics are now GA

September 24, 2020

Databricks Runtime 7.3, Databricks Runtime 7.3 for Machine Learning, and Databricks Runtime 7.3 for Genomics are now generally available. They bring many features and improvements, including:

  • Delta Lake performance optimizations significantly reduce overhead

  • Clone metrics

  • Delta Lake MERGE INTO improvements

  • Specify the initial position for Delta Lake Structured Streaming

  • Auto Loader improvements

  • Adaptive query execution

  • Azure Synapse Analytics connector column length control

  • Improved behavior of dbutils.credentials.showRoles

  • Kinesis starting position for stream using at_timestamp

  • Simplified pandas to Spark DataFrame conversion

  • New maxResultSize in toPandas() call

  • Debuggability of pandas and PySpark UDFs

  • GA of S3 storage connector updates

  • (ML only) Conda activation on workers

  • (Genomics only) Support for reading BGEN files with uncompressed or zstd-compressed genotypes

  • Library upgrades

For more information, see Databricks Runtime 7.3 LTS (unsupported) and Databricks Runtime 7.3 LTS for Machine Learning (unsupported).

Debugging hints for SAML credential passthrough misconfigurations

September 23-29, 2020: Version 3.29

The response from a single-sign on request using SAML credential passthrough now includes an error hint to help debug misconfigurations. For details, see Troubleshooting.

Single Node clusters (Public Preview)

September 23-29, 2020: Version 3.29

A Single Node cluster is a cluster consisting of a Spark driver and no Spark workers. In contrast, Standard mode clusters require at least one Spark worker to run Spark jobs. Single Node mode clusters are helpful in the following situations:

  • Running single node machine learning workloads that need Spark to load and save data

  • Lightweight exploratory data analysis (EDA)

For details, see Single-node or multi-node compute.

DBFS REST API rate limiting

September 23-29, 2020: Version 3.29

To ensure high quality of service under heavy load, Databricks is now enforcing API rate limits for DBFS API calls. Limits are set per workspace to ensure fair usage and high availability. Automatic retries are available using Databricks CLI version 0.12.0 and above. We advise all customers to switch to the latest Databricks CLI version.

New sidebar icons

September 23-29, 2020

We’ve updated the sidebar in the Databricks workspace UI. No big deal, but we think the new icons look pretty nice.

sidebar

Running jobs limit increase

September 23-29, 2020: Version 3.29

The concurrent running job run limit has been increased from 150 to 1000 per workspace. No longer will runs over 150 be queued in the pending state. Instead of a queue for run requests above concurrent runs, a 429 Too Many Requests response is returned when you request a run that cannot be started immediately. This limit increase was rolled out gradually and is now available on all workspaces in all regions.

Artifact access control lists (ACLs) in MLflow

September 23-29, 2020: Version 3.29

MLflow Experiment permissions are now enforced on artifacts in MLflow Tracking, enabling you to easily control access to your models, datasets, and other files. By default, when you create a new experiment, its run artifacts are now stored in an MLflow-managed location. The four MLflow Experiment permissions levels (NO PERMISSIONS, CAN READ, CAN EDIT, and CAN MANAGE) automatically apply to run artifacts stored in MLflow-managed locations as follows:

  • CAN EDIT or CAN MANAGE permissions are required to log run artifacts to an experiment.

  • CAN READ permissions are required to list and download run artifacts from an experiment.

For more information, see MLFlow experiment ACLs.

MLflow usability improvements

September 23-29, 2020: Version 3.29

This release includes the following MLflow usability improvements:

  • The MLflow Experiment and Registered Models pages now have tips to help new users get started.

  • The model version table now shows the description text for a model version. A new column shows the first 32 characters or the first line (whichever is shorter) of the description.

New Databricks Power BI connector (Public Preview)

September 22, 2020

Power BI Desktop version 2.85.681.0 includes a new Databricks Power BI connector that makes the integration between Databricks and Power BI far more seamless and reliable. The new connector comes with the following improvements:

  • Simple connection configuration: the new Power BI Databricks connector is integrated into Power BI, and you configure it using a simple dialog with a couple of clicks.

  • Faster imports and optimized metadata calls, thanks to the new Databricks ODBC driver, which comes with significant performance improvements.

  • Access to Databricks data through Power BI respects Databricks table access control.

For more information, see Connect Power BI to Databricks.

New JDBC and ODBC drivers bring faster and lower latency BI

September 15, 2020

We have released new versions of the Databricks JDBC and ODBC drivers (download) with the following improvements:

  • Performance: Reduced connection and short query latency, improved result transfer speed based on Apache Arrow serialization and improved metadata retrieval performance.

  • User experience: Authentication using Microsoft Entra ID OAuth2 access tokens, improved error messages and auto-retry when connecting to a shutdown cluster, more robust handling of retries on intermittent network errors.

  • Support for connections using HTTP proxy.

For more information about connecting to BI tools using JDBC and ODBC, see Databricks ODBC and JDBC Drivers.

MLflow Model Serving (Public Preview)

September 9-15, 2020: Version 3.28

MLflow Model Serving is now available in Public Preview. MLflow Model Serving allows you to deploy a MLflow model registered in Model Registry as a REST API endpoint hosted and managed by Databricks. When you enable model serving for a registered model, Databricks creates a cluster and deploys all non-archived versions of that model.

You can query all model versions by REST API requests with standard Databricks authentication. Model access rights are inherited from the Model Registry — anyone with read rights for a registered model can query any of the deployed model versions. While this service is in preview, we recommend its use for low throughput and non-critical applications.

For more information, see Legacy MLflow Model Serving on Databricks.

Clusters UI improvements

September 9-15, 2020: Version 3.28

The Clusters page now has separate tabs for All-Purpose Clusters and Job Clusters. The list on each tab is now paginated. In addition, we have fixed the delay that sometimes occurred between creating a cluster and being able to see it in the UI.

Visibility controls for jobs, clusters, notebooks, and other workspace objects

September 9-15, 2020: Version 3.28

By default, any user can see all jobs, clusters, notebooks, and folders in their workspace displayed in the Databricks UI and can list them using the Databricks API, even when access control is enabled for those objects and a user has no permissions on those objects.

Now any Databricks admin can enable visibility controls for notebooks and folders (workspace objects), clusters, and jobs to ensure that users can view only those objects that they have been given access to through workspace, cluster, or jobs access control.

See Access controls lists can no longer be disabled.

Ability to create tokens no longer permitted by default

September 9-15, 2020: Version 3.28

For workspaces created after the release of Databricks platform version 3.28, users will no longer have the ability to generate personal access tokens by default. Admins must explicitly grant those permissions, whether to the entire users group or on a user-by-user or group-by-group basis. Workspaces created before 3.28 was released will maintain the permissions that were already in place.

See Monitor and manage personal access tokens.

Support for c5.24xlarge instances

September 9-15, 2020: Version 3.28

Databricks now supports the c5.24xlarge EC2 instance type.

MLflow Model Registry supports sharing of models across workspaces

September 9, 2020

Databricks now supports access to the model registry from multiple workspaces. You can now register models, track model runs, and load models across workspaces. Multiple teams can now share access to models, and organizations can use multiple workspaces to handle the different stages of development. For details, see Share models across workspaces.

This functionality requires MLflow Python client version 1.11.0 or above.

Databricks Runtime 7.3 (Beta)

September 3, 2020

Databricks Runtime 7.3, Databricks Runtime 7.3 for Machine Learning, and Databricks Runtime 7.3 for Genomics are now available as Beta releases.

For information, see Databricks Runtime 7.3 LTS (unsupported) and Databricks Runtime 7.3 LTS for Machine Learning (unsupported).

E2 architecture—now GA—provides better security, scalability, and management tools

September 1, 2020

Databricks is excited to announce the general availability of the new E2 architecture for the Databricks Unified Data Analytics Platform on AWS. With this release, we have added business-critical features that make the platform more secure, more scalable, and simpler to manage for all of your data pipeline, analytics, and machine learning workloads.

The Databricks platform now provides stronger security controls required by regulated enterprises, is API-driven for better automation support, and increases the scalability of your production and business-critical operations. For more information, see our blog post.

Account API is generally available on the E2 version of the platform

September 1, 2020

As part of the GA of the E2 version of the Databricks platform, the Multi-workspace API has been renamed the Account API, and all endpoints related to workspace creation and customer-managed VPCs are also GA. To use the Account API to create new workspaces, your account must be on the E2 version of the platform or on a select custom plan that allows multiple workspaces per account. Only E2 accounts allow customer-managed VPCs.

Billable usage delivery configuration also requires the Account API. This feature is available on all Databricks accounts, but remains in Public Preview.

Secure cluster connectivity (no public IPs) is now the default on the E2 version of the platform

September 1, 2020

As part of the GA of the E2 version of the Databricks platform, secure cluster connectivity (no public IPs) is now the default for workspaces created on that version of the platform.

For more information, see Secure cluster connectivity.