May 2020
These features and Databricks platform improvements were released in May 2020.
Note
Releases are staged. Your Databricks account may not be updated until up to a week after the initial release date.
Databricks Runtime 6.6 for Genomics GA
May 26, 2020
Databricks Runtime 6.6 for Genomics is built on top of Databricks Runtime 6.6 and includes the following new features:
GFF3 reader
Custom reference genome support
Per-sample pipeline timeouts
BAM export option
Manifest blobs
Databricks Runtime 6.6 ML GA
May 26, 2020
Databricks Runtime 6.6 ML is built on top of Databricks Runtime 6.6 and includes the following new features:
Upgraded mlflow: 1.7.0 to 1.8.0
For more information, see the complete Databricks Runtime 6.6 ML (EoS) release notes.
Databricks Runtime 6.6 GA
May 26, 2020
Databricks Runtime 6.6 brings many library upgrades and new features, including the following Delta Lake features:
You can now evolve the schema of the table automatically with the
merge
operation. This is useful in scenarios where you want to upsert change data into a table and the schema of the data changes over time. Instead of detecting and applying schema changes before upserting,merge
can simultaneously evolve the schema and upsert the changes. See Automatic schema evolution for Delta Lake merge.The performance of merge operations that have only matched clauses, that is, they have only
update
anddelete
actions and noinsert
action, has been improved.Parquet tables that are referenced in the Hive metastore are now convertible to Delta Lake through their table identifiers using
CONVERT TO DELTA
.
For more information, see the complete Databricks Runtime 6.6 (EoS) release notes.
Easily view large numbers of MLflow registered models
May 21-28, 2020: Version 3.20
The MLflow Model Registry now supports server-side search and pagination for registered models, which enables organizations with large numbers of models to efficiently perform listing and search. As before, you can search models by name and get results ordered by name or the last updated time. However, if you have a large number of models, the pages will load much faster, and search will fetch the most up-to-date view of models.
Libraries configured to be installed on all clusters are not installed on clusters running Databricks Runtime 7.0 and above
May 21-28, 2020: Version 3.20
In Databricks Runtime 7.0 and above, the underlying version of Apache Spark uses Scala 2.12. Since libraries compiled against Scala 2.11 can disable Databricks Runtime 7.0 clusters in unexpected ways, clusters running Databricks Runtime 7.0 and above do not install libraries configured to be installed on all clusters. The cluster Libraries tab shows a status Skipped
and a deprecation message related to the changes in library handling.
If you have a cluster that was created on an earlier version of Databricks Runtime before 3.20 was released to your workspace, and you now edit that cluster to use Databricks Runtime 7.0, any libraries that were configured to be installed on all clusters will be installed on that cluster. In this case, any incompatible JARs in the installed libraries can cause the cluster to be disabled. The workaround is either to clone the cluster or to create a new cluster.
Databricks Runtime 7.0 for Genomics (Beta)
May 21, 2020
Databricks Runtime 7.0 for Genomics is built on top of Databricks Runtime 7.0 and includes the following library changes:
The ADAM library has been updated from version 0.30.0 to 0.32.0.
The Hail library is not included in Databricks Runtime 7.0 for Genomics as there is no release based on Apache Spark 3.0.
Databricks Runtime 7.0 ML (Beta)
May 21, 2020
Databricks Runtime 7.0 ML is built on top of Databricks Runtime 7.0 and includes the following new features:
Notebook-scoped Python libraries and custom environments managed by conda and pip commands.
Updates for major Python packages including tensorflow, tensorboard, pytorch, xgboost, sparkdl, and hyperopt.
Newly added Python packages lightgbm, nltk, petastorm, and plotly.
RStudio Server Open Source v1.2.
For more information, see the complete Databricks Runtime 7.0 ML (EoS) release notes.
Databricks Runtime 6.6 for Genomics (Beta)
May 7, 2020
Databricks Runtime 6.6 for Genomics is built on top of Databricks Runtime 6.6 and includes the following new features:
GFF3 reader
Custom reference genome support
Per-sample pipeline timeouts
BAM export option
Manifest blobs
Databricks Runtime 6.6 ML (Beta)
May 7, 2020
Databricks Runtime 6.6 ML is built on top of Databricks Runtime 6.6 and includes the following new features:
Upgraded mlflow: 1.7.0 to 1.8.0
For more information, see the complete Databricks Runtime 6.6 ML (EoS) release notes.
Databricks Runtime 6.6 (Beta)
May 7, 2020
Databricks Runtime 6.6 (Beta) brings many library upgrades and new features, including the following Delta Lake features:
You can now evolve the schema of the table automatically with the
merge
operation. This is useful in scenarios where you want to upsert change data into a table and the schema of the data changes over time. Instead of detecting and applying schema changes before upserting,merge
can simultaneously evolve the schema and upsert the changes. See Automatic schema evolution for Delta Lake merge.The performance of merge operations that have only matched clauses, that is, they have only
update
anddelete
actions and noinsert
action, has been improved.Parquet tables that are referenced in the Hive metastore are now convertible to Delta Lake through their table identifiers using
CONVERT TO DELTA
.
For more information, see the complete Databricks Runtime 6.6 (EoS) release notes.
Job clusters now tagged with job name and ID
May 5-12, 2020: Version 3.19
Job clusters are automatically tagged with the job name and ID. The tags appear in the
billable usage reports so that you can easily attribute your DBU usage by job and identify
anomalies. The tags are sanitized to cluster tag specifications, such as allowed characters,
maximum size, and maximum number of tags. The job name is contained in the RunName
tag and the job ID
is contained in the JobId
tag.
DBFS REST API delete endpoint size limit
May 5-12, 2020: Version 3.19
When you delete a large number of files recursively using the DBFS API, the delete operation is done in increments. The call returns a response after approximately 45s with an error message asking you to re-invoke the delete operation until the directory structure is fully deleted. For example:
{
"error_code":"PARTIAL_DELETE","message":"The requested operation has deleted 324 files. There are more files remaining. You must make another request to delete more."
}