Databricks Runtime 7.3 LTS for Genomics (Unsupported)
Databricks released this image in September 2020. It was declared Long Term Support (LTS) in October 2020.
Databricks Runtime 7.3 LTS for Genomics is a version of Databricks Runtime 7.3 LTS optimized for working with genomic and biomedical data. It is a component of the Databricks Unified Analytics Platform for Genomics.
Important
This documentation has been retired and might not be updated. The products, services, or technologies mentioned in this content are no longer supported.
The Databricks Genomics runtime has been deprecated. For open source equivalents, see repos for genomics-pipelines and Glow. Bioinformatics libraries that were part of the runtime have been released as a Docker container, which can be pulled from the ProjectGlow Dockerhub page.
For more information about the Databricks Runtime deprecation policy and schedule, see All supported Databricks Runtime releases.
For more information, including instructions for creating a Databricks Runtime for Genomics cluster, see Genomics guide. For more information on developing genomics applications, see Genomics guide.
For help with migration from Databricks Runtime 6.x to Databricks Runtime 7.3 LTS, see Databricks Runtime 7.x migration guide.
New features
Databricks Runtime 7.3 LTS for Genomics is built on top of Databricks Runtime 7.3 LTS. For information on what’s new in Databricks Runtime 7.3 LTS, see the Databricks Runtime 7.3 LTS release notes.
Support for reading BGEN files with uncompressed or zstd-compressed genotypes
Glow now supports reading BGEN files containing
SNP block probability data that is uncompressed or compressed using zstandard’s ZSTD_compress()
function, in
addition to the existing support for reading data compressed using zlib’s compress()
function.
Improvements
Variant liftOver performance
Performing variant liftOver with Glow is now up to 12x faster.
Faster big file upload to ABFS
Writing big files (such as VCF, BGEN and BAM) to the Azure Blob File System is now up to 2x faster.
Performance of DNASeq pipeline on autoscaling clusters
The DNASeq pipeline is now better tuned for autoscaling clusters.
Refactors
TNSeq pipeline renamed to MutSeq
The Tumor/Normal pipeline has been renamed from TNSeq to MutSeq.