Pipeline dependencies
Delta Live Tables supports external dependencies in your pipelines. Databricks recommends using one of two patterns to install Python packages:
Use the
%pip install
command to install packages for all source files in a pipeline.Import modules or libraries from source code stored in workspace files. See Import Python modules from workspace files.
Delta Live Tables also supports using global and cluster-scoped init scripts. However, these external dependencies, particularly init scripts, increase the risk of issues with runtime upgrades. To mitigate these risks, minimize using init scripts in your pipelines. If your processing requires init scripts, automate testing of your pipeline to detect problems early. If you use init scripts, Databricks recommends increasing your testing frequency.
Python libraries
To specify external Python libraries, use the %pip install
magic command. When an update starts, Delta Live Tables runs all cells containing a %pip install
command before running any table definitions. Every Python notebook included in the pipeline has access to all installed libraries. The following example installs the numpy
library and makes it globally available to any Python notebook in the pipeline:
%pip install numpy
import numpy as np
To install a Python wheel package, add the wheel path to the %pip install
command. Installed Python wheel packages are available to all tables in the pipeline. The following example installs a wheel named dltfns-1.0-py3-none-any.whl
from the DBFS directory /dbfs/dlt/
:
%pip install /dbfs/dlt/dltfns-1.0-py3-none-any.whl