Databricks Runtime 6.3 for Genomics (Unsupported)
Databricks released this image in January 2020.
Databricks Runtime for Genomics (Databricks Runtime Genomics) is a variant of Databricks Runtime 6.3 (Unsupported) optimized for working with genomic and biomedical data. It is a component of the Databricks Unified Analytics Platform for Genomics.
For more information, including instructions for creating a Databricks Runtime for Genomics cluster, see Genomics guide. For more information on developing genomics applications, see Genomics guide.
Databricks Runtime 6.3 for Genomics is built on top of Databricks Runtime 6.3. For information on what’s new in Databricks Runtime 6.3, see the Databricks Runtime 6.3 (Unsupported) release notes.
Joint genotyping pipeline from Delta
The joint genotyping in Databricks Runtime 6.3 for Genomics can now take Delta tables written by the DNASeq pipeline as input. This functionality allows you to use the two pipelines together without exporting results to gVCFs.
Automatic annotation parsing when reading VCFs
The version of Glow included in Databricks Runtime 6.3 for Genomics automatically
ANN INFO fields when reading VCFs.
INFO_ANN fields in the
resulting DataFrames now have structured schemas for simplified querying.
Improved multiallelic variant splitter
The multiallelic variant splitter in Glow and Databricks Runtime for Genomics now handles more complex types of multiallelic
sites. The new behavior mirrors the vt decompose
command line tool. In addition, you can now use the splitter as a standalone transformer by calling