Databricks Runtime for Genomics (Databricks Runtime Genomics) is a version of Databricks Runtime optimized for working with genomic and biomedical data. It is a component of the Databricks Unified Analytics Platform for Genomics. For more information on developing genomics applications, see Genomics.
Databricks Runtime for Genomics is generally available (GA) beginning with version 6.0.
- An optimized version of the Databricks-Regeneron open-source library Glow with all its functionalities as well as:
- Spark SQL support for reading and writing variant data
- Functions for common workflow elements
- Optimizations for common query patterns
- Turn-key pipelines parallelized with Apache Spark:
- Hail 0.2 integration
- Popular open source libraries, optimized for performance and reliability:
- Popular command line tools:
- Reference data (grch37 or 38, known SNP sites)
See the Databricks Runtime for Genomics release notes for a complete list of included libraries and versions.
Your Databricks workspace must have Databricks Runtime for Genomics enabled.
When you create a cluster, select a Databricks Runtime for Genomics version from the Databricks Runtime Version drop-down.