Databricks Runtime 7.4 for Genomics (Unsupported)

Databricks released this image in November 2020.

Databricks Runtime 7.4 for Genomics is a version of Databricks Runtime 7.4 (Unsupported) optimized for working with genomic and biomedical data. It is a component of the Databricks Unified Analytics Platform for Genomics.

Important

This documentation has been retired and might not be updated. The products, services, or technologies mentioned in this content are no longer supported.

The Databricks Genomics runtime has been deprecated. For open source equivalents, see repos for genomics-pipelines and Glow. Bioinformatics libraries that were part of the runtime have been released as a Docker container, which can be pulled from the ProjectGlow Dockerhub page.

For more information about the Databricks Runtime deprecation policy and schedule, see All supported Databricks Runtime releases.

For more information, including instructions for creating a Databricks Runtime for Genomics cluster, see Genomics guide. For more information on developing genomics applications, see Genomics guide.

New features

Databricks Runtime 7.4 for Genomics is built on top of Databricks Runtime 7.4. For information on what’s new in Databricks Runtime 7.4, see the Databricks Runtime 7.4 (Unsupported) release notes.

GloWGR for binary traits

GloWGR can now fit whole genome regression models for binary traits.

Logistic regression function accepts offset parameter

The logistic_regression_gwas function now accepts an offset parameter. This parameter is equivalent to a feature with a fixed coefficient of 1. Both the likelihood ratio test and Firth penalized likelihood ratio test respect this parameter. The output of GloWGR should be passed as an offset.

Hail support

Databricks Runtime 7.4 for Genomics is the first release in the 7.x line to package support for Hail.

Improvements

GloWGR convenience functions

The RidgeRegression and LogisticRegression classes in GloWGR now provide a transform_loco function to generate leave-one-chomosome-out (LOCO) predictions. In addition, GloWGR now includes a reshape_for_gwas function to reshape the predictions from GloWGR into a form that the association tests in Glow can accept.

GloWGR usability improvements

GloWGR for quantitative and binary traits now provides better performance and clearer error messages in the case of validation failures.

Faster VCF reader

Databricks Runtime 7.4 for Genomics includes an experimental fast VCF reader. You can activate the new reader by setting the Spark configuration io.projectglow.vcf.fastReaderEnabled to true in a notebook or cluster configuration.

Libraries

The following sections list the libraries included in Databricks Runtime 7.4 for Genomics that differ from those included in Databricks Runtime 7.4.

Upgraded libraries

  • Hail: 0.2.40 to 0.2.58

Packaged libraries

Library

Version

ADAM

0.32.0

GATK

4.1.4.1

Hail

0.2.58

Hadoop-bam

7.9.2

samtools

1.9

VEP

96