Databricks released this image in December 2019.
Databricks Runtime for Genomics (Databricks Runtime Genomics) is a variant of Databricks Runtime 6.2 optimized for working with genomic and biomedical data. It is a component of the Databricks Unified Analytics Platform for Genomics.
For more information, including instructions for creating a Databricks Runtime for Genomics cluster, see Databricks Runtime for Genomics. For more information on developing genomics applications, see Genomics.
Databricks Runtime 6.2 for Genomics is built on top of Databricks Runtime 6.2. For information on what’s new in Databricks Runtime 6.2, see the Databricks Runtime 6.2 release notes.
You can aggregate over genotypes for each sample in a DataFrame using aggregate_by_index. This function allows you to compute per-sample quality control (QC) metrics that are included in built-in QC functions.
The overhead of the pipe transformer has been reduced by roughly half. This speedup means that you can use Databricks Runtime for Genomics to parallelize command-line tools without sacrificing per-core efficiency.
The joint genotyping provided in Databricks Runtime 6.2 for Genomics more efficiently handles sample manifests with thousands of entries. In addition, the pipeline now handles missing gVCF blocks gracefully by inserting explicit no-calls.
The VEP annotation pipeline included in Databricks Runtime for Genomics provides streamlined integration with LOFTEE.
Samtools 1.9 is now installed.