ADAM
ADAM is a library for genomic data processing on Apache Spark. It is used to implement pipelines that operate on genomic read data such as BAM, SAM, and CRAM files.
To use ADAM in Databricks:
Launch a Databricks Runtime cluster with these Spark configurations:
# Hadoop configs org.apache.spark.serializer.KryoSerializer spark.kryo.registrator org.bdgenomics.adam.serialization.ADAMKryoRegistrator spark.hadoop.hadoopbam.bam.enable-bai-splitter true
Install the cluster libraries:
Maven:
org.bdgenomics.adam:adam-apis-spark3_2.12:<version>
PyPI:
bdgenomics.adam