Databricks Runtime for Genomics (Deprecated)
Databricks Runtime for Genomics (Databricks Runtime Genomics) is a version of Databricks Runtime optimized for working with genomic and biomedical data. It is a component of the Databricks Unified Analytics Platform for Genomics. For more information on developing genomics applications, see Genomics guide.
Note
Databricks Runtime for Genomics is deprecated. Databricks is no longer building new Databricks Runtime for Genomics releases and will remove support for Databricks Runtime for Genomics on September 24, 2022, when Databricks Runtime for Genomics 7.3 LTS support ends. At that point Databricks Runtime for Genomics will no longer be available for selection when you create a cluster. For more information about the Databricks Runtime deprecation policy and schedule, see Supported Databricks runtime releases and support schedule.
What’s in Databricks Runtime for Genomics?
- An optimized version of the Databricks-Regeneron open-source library Glow with all its functionalities as well as:
- Spark SQL support for reading and writing variant data
- Functions for common workflow elements
- Optimizations for common query patterns
- Turn-key pipelines parallelized with Apache Spark:
- Hail 0.2 integration
- Popular open source libraries, optimized for performance and reliability:
- ADAM
- GATK
- Hadoop-bam
- Popular command line tools:
- samtools
- Reference data (grch37 or 38, known SNP sites)
See the Databricks Runtime for Genomics release notes for a complete list of included libraries and versions.
Requirements
Your Databricks workspace must have Databricks Runtime for Genomics enabled.
Create a cluster using Databricks Runtime for Genomics
When you create a cluster, select a Databricks Runtime for Genomics version from the Databricks Runtime Version drop-down.