January 2020
These features and Databricks platform improvements were released in January 2020.
Note
Releases are staged. Your Databricks account may not be updated until up to a week after the initial release date.
This month saw the release of Databricks platform versions 3.9 and 3.11. There was no release of versions 3.10 or 3.8. Version 3.7 was a stability and bug-fix-only release.
All cluster and pool tags now propagate to usage reports
January 30, 2020
We have improved the way we propagate tags to capture DBU usage. If a cluster is created from a pool, both custom and default pool tags are now propagated to the cluster level in addition to the existing cluster default tags. In the usage download csv report, there is a new tags
column that shows all custom and default cluster tags and any custom and default pool tags as a string. This allows for better visibility into Databricks usage (total cost of ownership) and accurate attribution to business units and teams within your organization.
Cluster and pool tag propagation to EC2 instances is more accurate
January 30, 2020
Databricks recently changed the way we propagate tags to AWS EC2 instances, and these changes may effect the way you report your Databricks EC2 instance usage.
Before the change, if a cluster was created from a pool, its EC2 instances would inherit both the custom pool tags and the custom cluster tags. However, when that cluster was terminated, its EC2 instances continued to be tagged with that cluster until a new cluster started and used those instances. This meant that the terminated cluster could be attributed incorrectly with EC2 instance usage. In addition, because AWS EC2 instance usage is reported on an hourly basis, if an instance was shared among several clusters during an hour, the usage report would not accurately reflect actual usage for each cluster.
Now, if a cluster is created from a pool, its EC2 instances inherit only the custom and default pool tags, not the cluster tags. Therefore if you want to create clusters from a pool, make sure to assign all of the cluster tags you need to the pool. The behavior does not change for clusters that are not created from a pool: their tags continue to propagate to the EC2 instance for reporting.
For details, see Monitor usage using tags.
Databricks Runtime 6.3 for Genomics GA
January 22, 2020
Databricks Runtime 6.3 for Genomics is built on top of Databricks Runtime 6.3. It includes many improvements and upgrades from Databricks Runtime 6.2 for Genomics.
The key features are:
Support for Delta tables as input to the joint genotyping pipeline
Automatic annotation parsing when reading VCFs
Improved multiallelic variant splitter
Faster linear and logistic regression functions
Databricks Runtime 6.3 ML GA
January 22, 2020
Databricks Runtime 6.3 ML GA brings many library upgrades, including:
PyTorch: 1.3.0 to 1.3.1
torchvision: 0.4.1 to 0.4.2
MLflow: 1.4.0 to 1.5.0
Hyperopt: 0.2.1 to 0.2.2
For details, see the complete Databricks Runtime 6.3 for ML (EoS) release notes.
Databricks Runtime 6.3 GA
January 22, 2020
Databricks Runtime 6.3 GA brings new features, improvements, and many bug fixes.
This release introduces support for reading Delta tables from other processing engines and improved concurrency. The key features are:
Support for processing engines using manifest files
Improved concurrency for all Delta Lake operations
Improved support for file compaction
Improved performance for insert-only merge
For details, see the complete Databricks Runtime 6.3 (EoS) release notes.
Cluster worker machine images now use chrony for NTP
January 16-23, 2020: Version 3.11
Databricks cluster worker Amazon Machine Images (AMIs) are now configured to use AWS chrony instead of ntpd to synchronize time. chrony is a versatile implementation of the Network Time Protocol that synchronizes time faster and more accurately than ntpd.
Cluster standard autoscaling step is now configurable
January 7-14, 2020: Version 3.9
By default the first step of standard autoscaling adds 8 nodes. Now you can set the step value in the cluster Spark configuration. See Compute configuration reference.
SCIM API supports pagination for Get Users and Get Groups (Public Preview)
January 7-14, 2020: Version 3.9
The SCIM API now supports pagination for Get Users and Get Groups. When you specify the startIndex
and count
query parameters, SCIM will return a subset of users/groups. The startIndex
parameter is the 1-based index of the first result. The count
parameter is the maximum number of users or groups to return. This ensures scalability for the SCIM Client and simplifies SCIM calls for Databricks admins. See Groups API.
File browser swimlane widths increased to 240px
January 7-14, 2020: Version 3.9
The increased width reduces the need to mouse over objects to see the full filename.
Databricks Runtime 3.5 LTS support ends
January 2, 2020
Support for Databricks Runtime 3.5 LTS (Long Term Support) ended on January 2. See Databricks support lifecycles.