January 2020

These features and Databricks platform improvements were released in January 2020.

Note

Releases are staged. Your Databricks account may not be updated until up to a week after the initial release date.

This month saw the release of Databricks platform versions 3.9 and 3.11. There was no release of versions 3.10 or 3.8. Version 3.7 was a stability and bug-fix-only release.

All cluster and pool tags now propagate to usage reports

January 30, 2020

We have improved the way we propagate tags to capture DBU usage. If a cluster is created from a pool, both custom and default pool tags are now propagated to the cluster level in addition to the existing cluster default tags. In the usage download csv report, there is a new tags column that shows all custom and default cluster tags and any custom and default pool tags as a string. This allows for better visibility into Databricks usage (total cost of ownership) and accurate attribution to business units and teams within your organization.

Cluster and pool tag propagation to EC2 instances is more accurate

January 30, 2020

Databricks recently changed the way we propagate tags to AWS EC2 instances, and these changes may effect the way you report your Databricks EC2 instance usage.

Before the change, if a cluster was created from a pool, its EC2 instances would inherit both the custom pool tags and the custom cluster tags. However, when that cluster was terminated, its EC2 instances continued to be tagged with that cluster until a new cluster started and used those instances. This meant that the terminated cluster could be attributed incorrectly with EC2 instance usage. In addition, because AWS EC2 instance usage is reported on an hourly basis, if an instance was shared among several clusters during an hour, the usage report would not accurately reflect actual usage for each cluster.

Now, if a cluster is created from a pool, its EC2 instances inherit only the custom and default pool tags, not the cluster tags. Therefore if you want to create clusters from a pool, make sure to assign all of the cluster tags you need to the pool. The behavior does not change for clusters that are not created from a pool: their tags continue to propagate to the EC2 instance for reporting.

For details, see Monitor usage using tags.

Databricks Runtime 6.3 for Genomics GA

January 22, 2020

Databricks Runtime 6.3 for Genomics is built on top of Databricks Runtime 6.3. It includes many improvements and upgrades from Databricks Runtime 6.2 for Genomics.

The key features are:

  • Support for Delta tables as input to the joint genotyping pipeline

  • Automatic annotation parsing when reading VCFs

  • Improved multiallelic variant splitter

  • Faster linear and logistic regression functions

Databricks Runtime 6.3 ML GA

January 22, 2020

Databricks Runtime 6.3 ML GA brings many library upgrades, including:

  • PyTorch: 1.3.0 to 1.3.1

  • torchvision: 0.4.1 to 0.4.2

  • MLflow: 1.4.0 to 1.5.0

  • Hyperopt: 0.2.1 to 0.2.2

For details, see the complete Databricks Runtime 6.3 for ML (unsupported) release notes.

Databricks Runtime 6.3 GA

January 22, 2020

Databricks Runtime 6.3 GA brings new features, improvements, and many bug fixes.

This release introduces support for reading Delta tables from other processing engines and improved concurrency. The key features are:

  • Support for processing engines using manifest files

  • Improved concurrency for all Delta Lake operations

  • Improved support for file compaction

  • Improved performance for insert-only merge

For details, see the complete Databricks Runtime 6.3 (unsupported) release notes.

Cluster worker machine images now use chrony for NTP

January 16-23, 2020: Version 3.11

Databricks cluster worker Amazon Machine Images (AMIs) are now configured to use AWS chrony instead of ntpd to synchronize time. chrony is a versatile implementation of the Network Time Protocol that synchronizes time faster and more accurately than ntpd.

Cluster standard autoscaling step is now configurable

January 7-14, 2020: Version 3.9

By default the first step of standard autoscaling adds 8 nodes. Now you can set the step value in the cluster Spark configuration. See Compute configuration reference.

SCIM API supports pagination for Get Users and Get Groups (Public Preview)

January 7-14, 2020: Version 3.9

The SCIM API now supports pagination for Get Users and Get Groups. When you specify the startIndex and count query parameters, SCIM will return a subset of users/groups. The startIndex parameter is the 1-based index of the first result. The count parameter is the maximum number of users or groups to return. This ensures scalability for the SCIM Client and simplifies SCIM calls for Databricks admins. See Groups API.

File browser swimlane widths increased to 240px

January 7-14, 2020: Version 3.9

The increased width reduces the need to mouse over objects to see the full filename.

Databricks Runtime 3.5 LTS support ends

January 2, 2020

Support for Databricks Runtime 3.5 LTS (Long Term Support) ended on January 2. See Databricks runtime support lifecycles.