Databricks Runtime 7.4 for Genomics (Unsupported)

Databricks released this image in November 2020.

Databricks Runtime 7.4 for Genomics is a version of Databricks Runtime 7.4 (Unsupported) optimized for working with genomic and biomedical data. It is a component of the Databricks Unified Analytics Platform for Genomics.

Note

Databricks Runtime for Genomics is deprecated. Databricks is no longer building new Databricks Runtime for Genomics releases and will remove support for Databricks Runtime for Genomics on September 24, 2022, when Databricks Runtime for Genomics 7.3 LTS support ends. At that point Databricks Runtime for Genomics will no longer be available for selection when you create a cluster. For more information about the Databricks Runtime deprecation policy and schedule, see Supported Databricks runtime releases and support schedule.

For more information, including instructions for creating a Databricks Runtime for Genomics cluster, see Databricks Runtime for Genomics (Deprecated). For more information on developing genomics applications, see Genomics guide.

New features

Databricks Runtime 7.4 for Genomics is built on top of Databricks Runtime 7.4. For information on what’s new in Databricks Runtime 7.4, see the Databricks Runtime 7.4 (Unsupported) release notes.

GloWGR for binary traits

GloWGR can now fit whole genome regression models for binary traits.

Logistic regression function accepts offset parameter

The logistic_regression_gwas function now accepts an offset parameter. This parameter is equivalent to a feature with a fixed coefficient of 1. Both the likelihood ratio test and Firth penalized likelihood ratio test respect this parameter. The output of GloWGR should be passed as an offset.

Hail support

Databricks Runtime 7.4 for Genomics is the first release in the 7.x line to package support for Hail.

Improvements

GloWGR convenience functions

The RidgeRegression and LogisticRegression classes in GloWGR now provide a transform_loco function to generate leave-one-chomosome-out (LOCO) predictions. In addition, GloWGR now includes a reshape_for_gwas function to reshape the predictions from GloWGR into a form that the association tests in Glow can accept.

GloWGR usability improvements

GloWGR for quantitative and binary traits now provides better performance and clearer error messages in the case of validation failures.

Faster VCF reader

Databricks Runtime 7.4 for Genomics includes an experimental fast VCF reader. You can activate the new reader by setting the Spark configuration io.projectglow.vcf.fastReaderEnabled to true in a notebook or cluster configuration.

Libraries

The following sections list the libraries included in Databricks Runtime 7.4 for Genomics that differ from those included in Databricks Runtime 7.4.

Upgraded libraries

  • Hail: 0.2.40 to 0.2.58

Packaged libraries

Library Version
ADAM 0.32.0
GATK 4.1.4.1
Hail 0.2.58
Hadoop-bam 7.9.2
samtools 1.9
VEP 96