Databricks Runtime for Genomics
Databricks Runtime for Genomics (Databricks Runtime Genomics) is a version of Databricks Runtime optimized for working with genomic and biomedical data. It is a component of the Databricks Unified Analytics Platform for Genomics. For more information on developing genomics applications, see Genomics guide.
What’s in Databricks Runtime for Genomics?
- An optimized version of the Databricks-Regeneron open-source library Glow with all its functionalities as well as:
- Spark SQL support for reading and writing variant data
- Functions for common workflow elements
- Optimizations for common query patterns
- Turn-key pipelines parallelized with Apache Spark:
- Hail 0.2 integration
- Popular open source libraries, optimized for performance and reliability:
- ADAM
- GATK
- Hadoop-bam
- Popular command line tools:
- samtools
- Reference data (grch37 or 38, known SNP sites)
See the Databricks Runtime for Genomics release notes for a complete list of included libraries and versions.
Requirements
Your Databricks workspace must have Databricks Runtime for Genomics enabled.
Create a cluster using Databricks Runtime for Genomics
When you create a cluster, select a Databricks Runtime for Genomics version from the Databricks Runtime Version drop-down.