February 2020

These features and Databricks platform improvements were released in February 2020.

Note

Releases are staged. Your Databricks account may not be updated until up to a week after the initial release date.

Databricks Runtime 6.4 for Genomics GA

February 26, 2020

Databricks Runtime 6.4 for Genomics is built on top of Databricks Runtime 6.4. It includes many improvements and upgrades from Databricks Runtime 6.3 for Genomics.

The key features are:

  • You can now customize DNASeq. Pipeline users can selectively disable any legitimate combination of the read alignment, variant calling, and variant annotation stages. Users can also perform single-end read alignment.
  • The version of Glow included in Databricks Runtime 6.4 for Genomics now provides Python and Scala APIs for functions previously exposed only via SQL expressions. These functions are available for DataFrame operations, providing improved compile-time safety.

For details, see the complete Databricks Runtime 6.4 for Genomics release notes.

Databricks Runtime 6.4 ML GA

February 26, 2020

Databricks Runtime 6.4 ML GA brings library upgrades, including:

  • PyTorch: 1.3.1 to 1.4.0
  • Horovod: 0.18.2 to 1.19.0

For details, see the complete Databricks Runtime 6.4 ML release notes.

Databricks Runtime 6.4 GA

February 26, 2020

Databricks Runtime 6.4 GA brings new features, improvements, and many bug fixes.

  • Process new data files incrementally with Auto Loader (Public Preview). Auto Loader gives you a more efficient way to process new data files incrementally as they arrive on a cloud blob store during ETL. This is an improvement over file-based structured streaming, which identifies new files by repeatedly listing the cloud directory and tracking the files that have been seen, and can be very inefficient as the directory grows.
  • Load data into Delta Lake with idempotent retries (Public Preview). The COPY INTO SQL command lets you load data into Delta Lake with idempotent retries (Public Preview). To load data into Delta Lake today you have to use Apache Spark DataFrame APIs. If there are failures during loads, you have to handle them effectively.
  • Operation metrics for all writes, updates, and deletes on a Delta table now shown in table history.
  • Inline Matplotlib figures now enabled by default in Databricks notebooks (Public Preview).

For details, see the complete Databricks Runtime 6.4 release notes.

The Clusters and Jobs UIs now reflect new cluster terminology and cluster pricing

Feb 25 - March 3, 2019: Version 3.14

To identify the type of cluster you are using for a job and the pricing for each type of cluster, the Clusters page and the Configure Cluster UI for jobs have been updated.

On the Clusters page, the list headings have been renamed:

  • Interactive Clusters has been renamed to All-Purpose Clusters to reflect that you can use such clusters for any type of workload.
  • Automated Clusters has been renamed to Job Clusters to reflect that you can use these clusters only for jobs.

When you configure a cluster for a job, the Cluster Type options have been renamed:

  • New Automated Cluster has been renamed to New Job Cluster to reflect the fact that if you choose a new cluster it is charged at the job rate.
  • Existing Interactive Cluster has been renamed to Existing All-Purpose Cluster to reflect the fact that if you choose an existing cluster it is charged at the all-purpose rate.

New interactive charts offer rich client-side interactions

Feb 25 - March 3, 2019: Version 3.14

This release introduces two new interactive chart types that replace the bar chart and line chart implementations. In addition to existing chart functionality, the line chart has a few new custom plot options: setting a Y-axis range, showing or hiding markers, and applying log scale to the Y-axis. Both charts have a built-in toolbar that supports a rich set of client-side interactions.

Chart toolbar

If you want to use the existing chart implementations, you can select them from the Legacy Charts drop-down menu. Existing charts will continue to use the previously available implementations.

Legacy chart types

New data ingestion network adds partner integrations with Delta Lake (Public Preview)

February 24, 2020

Now you can easily populate your “lakehouse”—your data lake empowered by the kinds of data structures and data management features you typically get with a data warehouse—from hundreds of data sources into Delta Lake. At the heart of this network is the new Partner Integrations gallery, accessible from your workspace and providing access to a huge network of data sources via our partners Fivetran, Qlik, Infoworks, StreamSets, and Syncsort.

Partner integrations portal

For an overview, see our blog. For details, see Partner data integrations.

Flags to manage workspace security and notebook features now available

February 4-11, 2020: Version 3.12

This release introduces new flags for managing the security headers that are sent to prevent attacks on your workspace, as well as access to notebook results downloads and Git versioning. See Manage workspace security headers and Manage access to notebook features. All of these administrative options are enabled by default.