Hail 0.2

Hail is a library built on Spark for analyzing large genomic datasets. Hail 0.2 is integrated into the Databricks genomics runtime to simplify and scale your genomic analyses.

Important

Hail 0.2 and integration with Databricks are both experimental. Interfaces inside Hail are likely to change, as are properties of the Databricks environment, such as which Python packages are available by default.

Create a Hail cluster

To create a cluster with Hail installed:

  1. In the Custom Spark Version field, paste in the version key for the Databricks genomic runtime.

  2. Set the following environment variable:

    ENABLE_HAIL=true
    

    This environment variable causes the cluster to launch with Hail 0.2, its dependencies, and a newer version of Python installed.

Use Hail in a notebook

For the most part, Hail 0.2 code in Databricks works identically to the Hail documentation. However, there are a few modifications that are necessary for the Databricks environment.

Initialization

When initializing Hail, you must pass in the pre-created SparkContext and mark the initialization as idempotent.

import hail as hl
hl.init(sc, idempotent=True)

Plotting

Hail uses the Bokeh library to create plots. The show function built into Bokeh does not work in Databricks. To display a Bokeh plot generated by Hail, you can run a command like:

from bokeh.embed import components, file_html
from bokeh.resources import CDN
plot = hl.plot.histogram(mt.DP, range=(0,30), bins=30, title='DP Histogram', legend='DP')
html = file_html(plot, CDN, "Chart")
displayHTML(html)

See Bokeh in Python Notebooks for more information.

Limitations

  • When Hail support is enabled, your cluster uses Python 3.6, so notebooks written against different versions of Python may not work.
  • When Hail support is enabled, fewer Python libraries are installed by default. You can still use the Libraries features to install new libraries.

Example

After you’ve set up a Hail cluster, check out the Hail overview notebook.

Hail Overview

This notebook is too large to display inline. Get notebook link.