Databricks Runtime for Genomics (Deprecated)
Databricks Runtime for Genomics (Databricks Runtime Genomics) is a version of Databricks Runtime optimized for working with genomic and biomedical data. It is a component of the Databricks Unified Analytics Platform for Genomics. For more information on developing genomics applications, see Genomics guide.
Important
This documentation has been retired and might not be updated. The products, services, or technologies mentioned in this content are no longer supported.
The Databricks Genomics runtime has been deprecated. For open source equivalents, see repos for genomics-pipelines and Glow. Bioinformatics libraries that were part of the runtime have been released as a Docker container, which can be pulled from the ProjectGlow Dockerhub page.
For more information about the Databricks Runtime deprecation policy and schedule, see All supported Databricks Runtime releases.
What’s in Databricks Runtime for Genomics?
An optimized version of the Databricks-Regeneron open-source library Glow with all its functionalities as well as:
Spark SQL support for reading and writing variant data
Functions for common workflow elements
Optimizations for common query patterns
Turn-key pipelines parallelized with Apache Spark:
Popular open source libraries, optimized for performance and reliability:
ADAM
GATK
Hadoop-bam
Popular command line tools:
samtools
Reference data (grch37 or 38, known SNP sites)
See the Databricks Runtime for Genomics release notes for a complete list of included libraries and versions.
Create a cluster using Databricks Runtime for Genomics
When you create a cluster, select a Databricks Runtime for Genomics version from the Databricks Runtime Version drop-down.