May 2020

These features and Databricks platform improvements were released in May 2020.

Note

Releases are staged. Your Databricks account may not be updated until up to a week after the initial release date.

Databricks Runtime 6.6 for Genomics GA

May 26, 2020

Databricks Runtime 6.6 for Genomics is built on top of Databricks Runtime 6.6 and includes the following new features:

  • GFF3 reader

  • Custom reference genome support

  • Per-sample pipeline timeouts

  • BAM export option

  • Manifest blobs

Databricks Runtime 6.6 ML GA

May 26, 2020

Databricks Runtime 6.6 ML is built on top of Databricks Runtime 6.6 and includes the following new features:

  • Upgraded mlflow: 1.7.0 to 1.8.0

For more information, see the complete Databricks Runtime 6.6 ML (unsupported) release notes.

Databricks Runtime 6.6 GA

May 26, 2020

Databricks Runtime 6.6 brings many library upgrades and new features, including the following Delta Lake features:

  • You can now evolve the schema of the table automatically with the merge operation. This is useful in scenarios where you want to upsert change data into a table and the schema of the data changes over time. Instead of detecting and applying schema changes before upserting, merge can simultaneously evolve the schema and upsert the changes. See Automatic schema evolution for Delta Lake merge.

  • The performance of merge operations that have only matched clauses, that is, they have only update and delete actions and no insert action, has been improved.

  • Parquet tables that are referenced in the Hive metastore are now convertible to Delta Lake through their table identifiers using CONVERT TO DELTA.

For more information, see the complete Databricks Runtime 6.6 (unsupported) release notes.

Easily view large numbers of MLflow registered models

May 21-28, 2020: Version 3.20

The MLflow Model Registry now supports server-side search and pagination for registered models, which enables organizations with large numbers of models to efficiently perform listing and search. As before, you can search models by name and get results ordered by name or the last updated time. However, if you have a large number of models, the pages will load much faster, and search will fetch the most up-to-date view of models.

Libraries configured to be installed on all clusters are not installed on clusters running Databricks Runtime 7.0 and above

May 21-28, 2020: Version 3.20

In Databricks Runtime 7.0 and above, the underlying version of Apache Spark uses Scala 2.12. Since libraries compiled against Scala 2.11 can disable Databricks Runtime 7.0 clusters in unexpected ways, clusters running Databricks Runtime 7.0 and above do not install libraries configured to be installed on all clusters. The cluster Libraries tab shows a status Skipped and a deprecation message related to the changes in library handling.

If you have a cluster that was created on an earlier version of Databricks Runtime before 3.20 was released to your workspace, and you now edit that cluster to use Databricks Runtime 7.0, any libraries that were configured to be installed on all clusters will be installed on that cluster. In this case, any incompatible JARs in the installed libraries can cause the cluster to be disabled. The workaround is either to clone the cluster or to create a new cluster.

Databricks Runtime 7.0 for Genomics (Beta)

May 21, 2020

Databricks Runtime 7.0 for Genomics is built on top of Databricks Runtime 7.0 and includes the following library changes:

  • The ADAM library has been updated from version 0.30.0 to 0.32.0.

  • The Hail library is not included in Databricks Runtime 7.0 for Genomics as there is no release based on Apache Spark 3.0.

Databricks Runtime 7.0 ML (Beta)

May 21, 2020

Databricks Runtime 7.0 ML is built on top of Databricks Runtime 7.0 and includes the following new features:

  • Notebook-scoped Python libraries and custom environments managed by conda and pip commands.

  • Updates for major Python packages including tensorflow, tensorboard, pytorch, xgboost, sparkdl, and hyperopt.

  • Newly added Python packages lightgbm, nltk, petastorm, and plotly.

  • RStudio Server Open Source v1.2.

For more information, see the complete Databricks Runtime 7.0 ML (unsupported) release notes.

Databricks Runtime 6.6 for Genomics (Beta)

May 7, 2020

Databricks Runtime 6.6 for Genomics is built on top of Databricks Runtime 6.6 and includes the following new features:

  • GFF3 reader

  • Custom reference genome support

  • Per-sample pipeline timeouts

  • BAM export option

  • Manifest blobs

Databricks Runtime 6.6 ML (Beta)

May 7, 2020

Databricks Runtime 6.6 ML is built on top of Databricks Runtime 6.6 and includes the following new features:

  • Upgraded mlflow: 1.7.0 to 1.8.0

For more information, see the complete Databricks Runtime 6.6 ML (unsupported) release notes.

Databricks Runtime 6.6 (Beta)

May 7, 2020

Databricks Runtime 6.6 (Beta) brings many library upgrades and new features, including the following Delta Lake features:

  • You can now evolve the schema of the table automatically with the merge operation. This is useful in scenarios where you want to upsert change data into a table and the schema of the data changes over time. Instead of detecting and applying schema changes before upserting, merge can simultaneously evolve the schema and upsert the changes. See Automatic schema evolution for Delta Lake merge.

  • The performance of merge operations that have only matched clauses, that is, they have only update and delete actions and no insert action, has been improved.

  • Parquet tables that are referenced in the Hive metastore are now convertible to Delta Lake through their table identifiers using CONVERT TO DELTA.

For more information, see the complete Databricks Runtime 6.6 (unsupported) release notes.

Job clusters now tagged with job name and ID

May 5-12, 2020: Version 3.19

Job clusters are automatically tagged with the job name and ID. The tags appear in the billable usage reports so that you can easily attribute your DBU usage by job and identify anomalies. The tags are sanitized to cluster tag specifications, such as allowed characters, maximum size, and maximum number of tags. The job name is contained in the RunName tag and the job ID is contained in the JobId tag.

DBFS REST API delete endpoint size limit

May 5-12, 2020: Version 3.19

When you delete a large number of files recursively using the DBFS API, the delete operation is done in increments. The call returns a response after approximately 45s with an error message asking you to re-invoke the delete operation until the directory structure is fully deleted. For example:

{
  "error_code":"PARTIAL_DELETE","message":"The requested operation has deleted 324 files. There are more files remaining. You must make another request to delete more."
}

Restore deleted notebook cells

May 5-12, 2020: Version 3.19

You can now restore deleted cells either by using the (Z) keyboard shortcut or by selecting Edit > Undo Delete Cells.

Jobs pending queue limit

May 5-12, 2020: Version 3.19

A workspace is now limited to 1000 active (running and pending) job runs. Since a workspace is limited to 150 concurrent (running) job runs, a workspace can have up to 850 runs in the pending queue.