Pipeline dependencies
Delta Live Tables supports external dependencies in your pipelines. Databricks recommends using one of two patterns to install Python packages:
Use the
%pip install
command to install packages for all source files in a pipeline.Import modules or libraries from source code stored in workspace files. See Import Python modules from Databricks repos.
Delta Live Tables also supports using global and cluster-scoped init scripts. However, these external dependencies, particularly init scripts, increase the risk of issues with runtime upgrades. To mitigate these risks, minimize using init scripts in your pipelines. If your processing requires init scripts, automate testing of your pipeline to detect problems early. If you use init scripts, Databricks recommends increasing your testing frequency.
Important
Init scripts are not supported with Unity Catalog-enabled pipelines. See Limitations.
Python libraries
To specify external Python libraries, use the %pip install
magic command. When an update starts, Delta Live Tables runs all cells containing a %pip install
command before running any table definitions. Every Python notebook included in the pipeline shares a library environment and has access to all installed libraries.
Important
Because every notebook in a pipeline shares a library environment, you cannot define different library versions in a single pipeline. If your processing requires different library versions, you must define them in different pipelines.
The following example installs the numpy
library and makes it globally available to any Python notebook in the pipeline:
%pip install numpy
import numpy as np
To install a Python wheel package, add the Python wheel path to the %pip install
command. Installed Python wheel packages are available to all tables in the pipeline. The following example installs a Python wheel named dltfns-1.0-py3-none-any.whl
from the DBFS directory /dbfs/dlt/
:
%pip install /dbfs/dlt/dltfns-1.0-py3-none-any.whl