Databricks Runtime 7.1 for Genomics (Unsupported)

Databricks released this image in July 2020.

Databricks Runtime 7.1 for Genomics is a version of Databricks Runtime 7.1 (Unsupported) optimized for working with genomic and biomedical data. It is a component of the Databricks Unified Analytics Platform for Genomics.

For more information, including instructions for creating a Databricks Runtime for Genomics cluster, see Databricks Runtime for Genomics (Deprecated). For more information on developing genomics applications, see Genomics guide.

New features

Databricks Runtime 7.1 for Genomics is built on top of Databricks Runtime 7.1. For information on what’s new in Databricks Runtime 7.1, see the Databricks Runtime 7.1 (Unsupported) release notes.

LOCO transformation

Glow now provides a function transform_loco to perform the GloWGR ridge regression transformation with a leave one chromosome out (LOCO) strategy. Partitioning the predicted phenotype values avoids proximal contamination during downstream association testing. The GloWGR documentation demonstrates the new usage.

GloWGR output reshaping function

Glow now provides a function reshape_for_gwas to convert the phenotype estimates output by GloWGR from a Pandas DataFrame to a Spark DataFrame compatible with the Glow genome-wide association study (GWAS) regression functions. The GloWGR documentation reflects the new usage.

Improvements

RNASeq outputs unpaired alignments

The RNASeq pipeline now outputs unpaired alignments from STAR. These previously were dropped in favor of only paired alignments.

Libraries

The following sections list the libraries included in Databricks Runtime 7.1 for Genomics that differ from those included in Databricks Runtime 7.1.

Packaged libraries

Library Version
ADAM 0.32.0
GATK 4.1.4.1
Hadoop-bam 7.9.2
samtools 1.9
VEP 96