Databricks Runtime 7.3 LTS for Genomics
Databricks released this image in September 2020. It was declared Long Term Support (LTS) in October 2020.
Databricks Runtime 7.3 LTS for Genomics is a version of Databricks Runtime 7.3 LTS optimized for working with genomic and biomedical data. It is a component of the Databricks Unified Analytics Platform for Genomics.
Note
Databricks Runtime for Genomics is deprecated. Databricks is no longer building new Databricks Runtime for Genomics releases and will remove support for Databricks Runtime for Genomics on September 24, 2022, when Databricks Runtime for Genomics 7.3 LTS support ends. At that point Databricks Runtime for Genomics will no longer be available for selection when you create a cluster. For more information about the Databricks Runtime deprecation policy and schedule, see Supported Databricks runtime releases and support schedule. Bioinformatics libraries that were part of the runtime have been released as Docker Containers, which you can find on the ProjectGlow Dockerhub page.
For more information, including instructions for creating a Databricks Runtime for Genomics cluster, see Databricks Runtime for Genomics (Deprecated). For more information on developing genomics applications, see Genomics guide.
For help with migration from Databricks Runtime 6.x to Databricks Runtime 7.3 LTS, see Databricks Runtime 7.x migration guide.
New features
Databricks Runtime 7.3 LTS for Genomics is built on top of Databricks Runtime 7.3 LTS. For information on what’s new in Databricks Runtime 7.3 LTS, see the Databricks Runtime 7.3 LTS release notes.
Support for reading BGEN files with uncompressed or zstd-compressed genotypes
Glow now supports reading BGEN files containing
SNP block probability data that is uncompressed or compressed using zstandard’s ZSTD_compress()
function, in
addition to the existing support for reading data compressed using zlib’s compress()
function.
Improvements
Variant liftOver performance
Performing variant liftOver with Glow is now up to 12x faster.
Faster big file upload to ABFS
Writing big files (such as VCF, BGEN and BAM) to the Azure Blob File System is now up to 2x faster.
Performance of DNASeq pipeline on autoscaling clusters
The DNASeq pipeline is now better tuned for autoscaling clusters.
Refactors
TNSeq pipeline renamed to MutSeq
The Tumor/Normal pipeline has been renamed from TNSeq to MutSeq.