Databricks Runtime for Genomics (Databricks Runtime Genomics) is a version of Databricks Runtime optimized for working with genomic and biomedical data. It is a component of the Databricks Unified Analytics Platform for Genomics. For more information on developing genomics applications, see Genomics.
Databricks Runtime for Genomics is generally available (GA) beginning with version 6.0.
- An optimized version of the Databricks-Regeneron open-source library Glow with all its functionalities as well as
- Spark SQL support for reading and writing variant data
- Functions for common workflow elements
- Optimizations for common query patterns
- Turn-key pipelines parallelized with Apache Spark
- Hail 0.2 integration
- Popular open source libraries, optimized for performance and reliability
- ADAM v0.25.0
- GATK v126.96.36.199
- Hadoop-bam v7.9.2
- Popular command line tools
- samtools v1.9
- Reference data (grch37 or 38, known SNP sites)
A workspace administrator must enable Databricks Runtime for Genomics for it to appear in the Databricks Runtime Version drop-down when configuring a cluster.
- Go to the Admin Console.
- Click the Advanced tab.
- Click the Enable button next to Databricks Runtime for Genomics.
When you create a cluster, select a Databricks Runtime for Genomics version from the Databricks Runtime Version drop-down.