Databricks Runtime 6.6 for Genomics (unsupported)
Databricks released this image in May 2020.
Databricks Runtime 6.6 for Genomics is a version of Databricks Runtime 6.6 (unsupported) optimized for working with genomic and biomedical data. It is a component of the Databricks Unified Analytics Platform for Genomics.
For more information, including instructions for creating a Databricks Runtime for Genomics cluster, see Genomics guide. For more information on developing genomics applications, see Genomics guide.
Databricks Runtime 6.6 for Genomics is built on top of Databricks Runtime 6.6. For information on what’s new in Databricks Runtime 6.6, see the Databricks Runtime 6.6 (unsupported) release notes.
The version of Glow included in Databricks Runtime 6.6 for Genomics can read GFF3 files. The DataFrame
schema is inferred from the present attributes. We added this feature in
Per-sample pipeline timeouts
tumor/normal pipelines now have an option to set a
BAM export option
tumor/normal pipelines now have an option to export to BAM.
Aligned reads can be exported as a single BAM or as sharded BAMs.
Manifests for the DNASeq,
joint genotyping pipelines can now be provided via a
blob as well as a path. If the manifest is provided via a blob, all paths must be absolute.
Variant normalizer flexibility
The Glow variant normalizer now accepts compressed reference sequences, such as block-gzipped
FASTA files. We added this improvement in open source.
Pipe transformer tolerates empty partitions
The Glow pipe transformer now ignores empty partitions, so that users no longer have to
coalesce the input DataFrame. We added this improvement in open source.
Duplicate marking performance
Duplicate marking during the read alignment stage of the DNASeq
pipeline is now faster and requires less memory.
emitAllAlleles options have been removed from the
joint genotyping pipeline.
The following libraries included in Databricks Runtime 6.6 for Genomics differ from those included in Databricks Runtime 6.6.