The Databricks Runtime for Health and Life Sciences (Databricks Runtime HLS) is a version of Databricks Runtime optimized for working with genomic and biomedical data. It is a component of the Unified Analytics Platform for Genomics.
- Turn-key pipelines parallelized with Apache Spark
- Spark SQL optimizations for common query patterns
- Spark SQL support for reading and writing genotype data
- Hail 0.2 integration
- Popular open source libraries, optimized for performance and reliability
- ADAM v0.25.0
- GATK v22.214.171.124
- Hadoop-bam v7.9.2
- Reference data (grch37 or 38, known SNP sites)
When you create a cluster, select a Databricks Runtime HLS version from the Databricks Runtime Version drop-down.